Analytics

Overview

The Analytics page is the workspace-wide view of your data quality posture over time. Where job comparison answers “what changed between two specific runs,” analytics answers “what’s the trend across the last 30 days, 90 days, or all time.” Open Analytics from the sidebar. Everything is org-scoped — workers see analytics filtered to jobs they have access to, admins see the full org view.

What it shows

Analytics is organised into five panels, each backed by its own endpoint group.

Quality

The quality panel tracks how clean your data is over time:

Chart	What it shows
Quality over time	Daily/weekly aggregate quality score across all jobs
Quality delta	Period-over-period change with significance markers
Null rates	Average null percentage per period, broken down by column type
Recurring issues	Issues that have appeared on multiple files — your “long tail” of problems
Top issues	The biggest single issues by row count or severity

The recurring issues view is the most actionable: it surfaces the patterns worth fixing at the source rather than per-file.

Files

The files panel shows volume and per-file behavior:

Chart	What it shows
File volume	Files processed per day/week, by source type
File leaderboard	Worst-quality files in the period
File history	Score timeline for a single named file across re-uploads
Filenames	All filenames seen, with last-scan timestamps

The leaderboard is the fastest way to find “the file that always breaks contracts” so you can prioritize root-cause work. The GDPR panel tracks your PII exposure:

Chart	What it shows
GDPR exposure	Count of PII columns detected per period
GDPR file detail	Per-file breakdown of detected PII categories

Use this for compliance reviews and to verify that GDPR scanning is actually catching what it should.

Insights

The insights panel breaks down quality issues by category:

Chart	What it shows
Issue breakdown	Distribution of issue types (nulls, duplicates, format, outliers)
Category distribution	What semantic categories your data is composed of
Column heatmap	Which columns have the most issues across all files
Column detail	Drill into a single column’s issue history
Recent activity	Latest scans, completions, and contract violations

The column heatmap is excellent for “we always have problems with the customer table” investigations.

Value & AI cost

The value panel quantifies what ORCA is actually doing for you:

Chart	What it shows
Value summary	Estimated time and cost saved by ORCA’s automated checks
Value benchmarks	Your usage vs typical workspaces of similar size
AI costs	Token consumption per period, broken down by operation type
AI health	Classification confidence trends — is the AI getting better or worse on your data?
AI drift	Whether the AI’s classification decisions are stable over time

Use AI costs to forecast monthly token spend and AI drift to catch the rare case where the AI’s behavior on your data shifts (usually because your data shifted).

Filtering

Every panel supports the same filters at the top of the page:

Date range — last 7 days, 30 days, 90 days, or custom
Source — filter to a single connected data source
File pattern — match by filename glob
Job status — filter to complete only, or include partial/failed

Filters affect every panel simultaneously so you can scope the entire view.

API access

Analytics is a thin layer over many endpoint groups. The exact shapes are documented in API endpoints, but the family is:

GET /api/v1/analytics/quality-over-time
GET /api/v1/analytics/quality-delta
GET /api/v1/analytics/null-rates
GET /api/v1/analytics/recurring-issues
GET /api/v1/analytics/top-issues

GET /api/v1/analytics/file-volume
GET /api/v1/analytics/file-leaderboard
GET /api/v1/analytics/file-history
GET /api/v1/analytics/filenames

GET /api/v1/analytics/gdpr-exposure
GET /api/v1/analytics/gdpr-file-detail

GET /api/v1/analytics/issue-breakdown
GET /api/v1/analytics/category-distribution
GET /api/v1/analytics/column-detail
GET /api/v1/analytics/column-heatmap
GET /api/v1/analytics/recent-activity

GET /api/v1/analytics/value-summary
GET /api/v1/analytics/value-benchmarks
GET /api/v1/analytics/ai-costs
GET /api/v1/analytics/ai-health
GET /api/v1/analytics/ai-drift

All endpoints accept the same date range and filter parameters as the UI. All return empty data on error rather than failing — analytics are non-critical and never block the rest of the app.

Tips

Start with the quality-over-time chart. A flat line is healthy. A jagged line means upstream variability worth investigating. A steady decline means you have a regression problem.
Use recurring issues as your backlog. Anything that shows up on three or more files is worth fixing at the source, not per-file.
Watch AI drift after big upstream changes. When your source schema changes, drift spikes are normal — but they should settle within a few scans. Sustained drift means the AI’s confidence is genuinely lower on the new data.
Set up a weekly digest email. Combine analytics with scheduled reports so the highlights land in your inbox without anyone opening the dashboard.
Treat AI cost as a budget signal, not a performance metric. Cost going up means you’re scanning more data, not that ORCA is getting more expensive per scan.

What’s next?

Job comparison — when you need a focused two-job diff instead of trends
Reports — schedule analytics summaries for your team
Contracts — convert recurring issues into enforced rules so they stop showing up in analytics

Getting started

Features

Administration

Integrations

Security & compliance

Developer Tools

Methodology

Overview

What it shows

Quality

Files

Insights

Value & AI cost

Filtering

API access

Tips

What’s next?

​Overview

​What it shows

​Quality

​Files

​GDPR

​Insights

​Value & AI cost

​Filtering

​API access

​Tips

​What’s next?

Overview

What it shows

Quality

Files

GDPR

Insights

Value & AI cost

Filtering

API access

Tips

What’s next?