Skip to main content

Overview

The Analytics page is the workspace-wide view of your data quality posture over time. Where job comparison answers “what changed between two specific runs,” analytics answers “what’s the trend across the last 30 days, 90 days, or all time.” Open Analytics from the sidebar. Everything is org-scoped — workers see analytics filtered to jobs they have access to, admins see the full org view.

What it shows

Analytics is organised into five panels, each backed by its own endpoint group.

Quality

The quality panel tracks how clean your data is over time:
ChartWhat it shows
Quality over timeDaily/weekly aggregate quality score across all jobs
Quality deltaPeriod-over-period change with significance markers
Null ratesAverage null percentage per period, broken down by column type
Recurring issuesIssues that have appeared on multiple files — your “long tail” of problems
Top issuesThe biggest single issues by row count or severity
The recurring issues view is the most actionable: it surfaces the patterns worth fixing at the source rather than per-file.

Files

The files panel shows volume and per-file behavior:
ChartWhat it shows
File volumeFiles processed per day/week, by source type
File leaderboardWorst-quality files in the period
File historyScore timeline for a single named file across re-uploads
FilenamesAll filenames seen, with last-scan timestamps
The leaderboard is the fastest way to find “the file that always breaks contracts” so you can prioritize root-cause work.

GDPR

The GDPR panel tracks your PII exposure:
ChartWhat it shows
GDPR exposureCount of PII columns detected per period
GDPR file detailPer-file breakdown of detected PII categories
Use this for compliance reviews and to verify that GDPR scanning is actually catching what it should.

Insights

The insights panel breaks down quality issues by category:
ChartWhat it shows
Issue breakdownDistribution of issue types (nulls, duplicates, format, outliers)
Category distributionWhat semantic categories your data is composed of
Column heatmapWhich columns have the most issues across all files
Column detailDrill into a single column’s issue history
Recent activityLatest scans, completions, and contract violations
The column heatmap is excellent for “we always have problems with the customer table” investigations.

Value & AI cost

The value panel quantifies what ORCA is actually doing for you:
ChartWhat it shows
Value summaryEstimated time and cost saved by ORCA’s automated checks
Value benchmarksYour usage vs typical workspaces of similar size
AI costsToken consumption per period, broken down by operation type
AI healthClassification confidence trends — is the AI getting better or worse on your data?
AI driftWhether the AI’s classification decisions are stable over time
Use AI costs to forecast monthly token spend and AI drift to catch the rare case where the AI’s behavior on your data shifts (usually because your data shifted).

Filtering

Every panel supports the same filters at the top of the page:
  • Date range — last 7 days, 30 days, 90 days, or custom
  • Source — filter to a single connected data source
  • File pattern — match by filename glob
  • Job status — filter to complete only, or include partial/failed
Filters affect every panel simultaneously so you can scope the entire view.

API access

Analytics is a thin layer over many endpoint groups. The exact shapes are documented in API endpoints, but the family is:
GET /api/v1/analytics/quality-over-time
GET /api/v1/analytics/quality-delta
GET /api/v1/analytics/null-rates
GET /api/v1/analytics/recurring-issues
GET /api/v1/analytics/top-issues

GET /api/v1/analytics/file-volume
GET /api/v1/analytics/file-leaderboard
GET /api/v1/analytics/file-history
GET /api/v1/analytics/filenames

GET /api/v1/analytics/gdpr-exposure
GET /api/v1/analytics/gdpr-file-detail

GET /api/v1/analytics/issue-breakdown
GET /api/v1/analytics/category-distribution
GET /api/v1/analytics/column-detail
GET /api/v1/analytics/column-heatmap
GET /api/v1/analytics/recent-activity

GET /api/v1/analytics/value-summary
GET /api/v1/analytics/value-benchmarks
GET /api/v1/analytics/ai-costs
GET /api/v1/analytics/ai-health
GET /api/v1/analytics/ai-drift
All endpoints accept the same date range and filter parameters as the UI. All return empty data on error rather than failing — analytics are non-critical and never block the rest of the app.

Tips

  • Start with the quality-over-time chart. A flat line is healthy. A jagged line means upstream variability worth investigating. A steady decline means you have a regression problem.
  • Use recurring issues as your backlog. Anything that shows up on three or more files is worth fixing at the source, not per-file.
  • Watch AI drift after big upstream changes. When your source schema changes, drift spikes are normal — but they should settle within a few scans. Sustained drift means the AI’s confidence is genuinely lower on the new data.
  • Set up a weekly digest email. Combine analytics with scheduled reports so the highlights land in your inbox without anyone opening the dashboard.
  • Treat AI cost as a budget signal, not a performance metric. Cost going up means you’re scanning more data, not that ORCA is getting more expensive per scan.

What’s next?

  • Job comparison — when you need a focused two-job diff instead of trends
  • Reports — schedule analytics summaries for your team
  • Contracts — convert recurring issues into enforced rules so they stop showing up in analytics