Overview
The Analytics page is the workspace-wide view of your data quality posture over time. Where job comparison answers “what changed between two specific runs,” analytics answers “what’s the trend across the last 30 days, 90 days, or all time.” Open Analytics from the sidebar. Everything is org-scoped — workers see analytics filtered to jobs they have access to, admins see the full org view.What it shows
Analytics is organised into five panels, each backed by its own endpoint group.Quality
The quality panel tracks how clean your data is over time:| Chart | What it shows |
|---|---|
| Quality over time | Daily/weekly aggregate quality score across all jobs |
| Quality delta | Period-over-period change with significance markers |
| Null rates | Average null percentage per period, broken down by column type |
| Recurring issues | Issues that have appeared on multiple files — your “long tail” of problems |
| Top issues | The biggest single issues by row count or severity |
Files
The files panel shows volume and per-file behavior:| Chart | What it shows |
|---|---|
| File volume | Files processed per day/week, by source type |
| File leaderboard | Worst-quality files in the period |
| File history | Score timeline for a single named file across re-uploads |
| Filenames | All filenames seen, with last-scan timestamps |
GDPR
The GDPR panel tracks your PII exposure:| Chart | What it shows |
|---|---|
| GDPR exposure | Count of PII columns detected per period |
| GDPR file detail | Per-file breakdown of detected PII categories |
Insights
The insights panel breaks down quality issues by category:| Chart | What it shows |
|---|---|
| Issue breakdown | Distribution of issue types (nulls, duplicates, format, outliers) |
| Category distribution | What semantic categories your data is composed of |
| Column heatmap | Which columns have the most issues across all files |
| Column detail | Drill into a single column’s issue history |
| Recent activity | Latest scans, completions, and contract violations |
Value & AI cost
The value panel quantifies what ORCA is actually doing for you:| Chart | What it shows |
|---|---|
| Value summary | Estimated time and cost saved by ORCA’s automated checks |
| Value benchmarks | Your usage vs typical workspaces of similar size |
| AI costs | Token consumption per period, broken down by operation type |
| AI health | Classification confidence trends — is the AI getting better or worse on your data? |
| AI drift | Whether the AI’s classification decisions are stable over time |
Filtering
Every panel supports the same filters at the top of the page:- Date range — last 7 days, 30 days, 90 days, or custom
- Source — filter to a single connected data source
- File pattern — match by filename glob
- Job status — filter to complete only, or include partial/failed
API access
Analytics is a thin layer over many endpoint groups. The exact shapes are documented in API endpoints, but the family is:Tips
- Start with the quality-over-time chart. A flat line is healthy. A jagged line means upstream variability worth investigating. A steady decline means you have a regression problem.
- Use recurring issues as your backlog. Anything that shows up on three or more files is worth fixing at the source, not per-file.
- Watch AI drift after big upstream changes. When your source schema changes, drift spikes are normal — but they should settle within a few scans. Sustained drift means the AI’s confidence is genuinely lower on the new data.
- Set up a weekly digest email. Combine analytics with scheduled reports so the highlights land in your inbox without anyone opening the dashboard.
- Treat AI cost as a budget signal, not a performance metric. Cost going up means you’re scanning more data, not that ORCA is getting more expensive per scan.
What’s next?
- Job comparison — when you need a focused two-job diff instead of trends
- Reports — schedule analytics summaries for your team
- Contracts — convert recurring issues into enforced rules so they stop showing up in analytics