Overview
Job comparison answers “what changed between these two runs?” Pick two completed jobs and ORCA shows a side-by-side diff of:- Overall quality scores and AI Readiness scores
- Per-column classifications (added, removed, reclassified)
- Per-column issue counts (improved, regressed, unchanged)
- Format and distribution shifts on shared columns
When to use it
| Scenario | What to compare |
|---|---|
| Verify remediation | Original job vs the job for the remediated copy |
| Track upstream changes | Yesterday’s scheduled scan vs today’s |
| A/B a pipeline change | Last run before the change vs the first run after |
| Audit a contract violation | The last passing job vs the first violating one |
| Onboard a new dataset | Reference job (known good) vs the candidate file |
Running a comparison
In the UI: open Compare from the sidebar, pick two jobs from the dropdowns, and click Compare. Both jobs must be in your organisation and both must be incomplete or partial status.
Constraints
- Both jobs must belong to your organisation (cross-tenant comparison is impossible)
- Both jobs must be at least
completeorpartial(in-progress jobs cannot be compared) - Worker accounts can only compare jobs they have RBAC access to (own + team)
- Admins can compare any two jobs in the org
What the comparison shows
Score deltas
The header shows the absolute and relative change for:- Quality score — overall 0–100, computed with the same diminishing-weights formula on both jobs
- AI Readiness score — overall and per-dimension
- Issue count — total quality issues across all files
- Column count — useful for catching schema changes
Column matching
Columns are matched by normalised name — case-insensitive, whitespace-trimmed. SoCustomer_Email, customer_email, and CUSTOMER EMAIL all map to the same column.
This means:
- Renamed columns appear as one removed + one added (true rename detection requires explicit lineage)
- Reordered columns are treated as the same column, no diff
- Re-typed columns show up as a classification change on the shared column
Per-column diff
For every shared column, ORCA shows:- Classification change — old type → new type, with confidence delta
- Issue delta — how many issues were added or removed
- Null rate delta — useful for catching upstream filtering bugs
- Sample value change — first few sample values from each side
API
- Job metadata for both sides
- Score totals and deltas
- An array of column diffs (shared, added, removed)
- File-level breakdown when both jobs have multiple files
Limitations
- Comparison is two-job only. Multi-job comparison (3+) isn’t supported. For long-term trends use analytics instead.
- Schema-aware diffs only. ORCA matches columns by name, not by content fingerprinting. A column rename without lineage looks like a removal + addition.
- No cell-level diff. Comparison works at the column-statistic level. To diff actual rows, export both files and use a row-level diff tool.
Tips
- Always compare against the most recent passing run, not the first one. “Today vs yesterday” is more actionable than “today vs the first scan ever.”
- Use contracts for thresholds. If you find yourself running the same comparison repeatedly to check the same condition, encode that condition as a contract instead — it’ll fire automatically.
- Pair comparison with auto-remediation. Run remediation, then compare original vs remediated to verify the score improvement matches the predicted impact.
- Save the URL. The compare page URL is shareable and reproducible — paste it in an incident ticket so the next reviewer lands on the exact same diff.
What’s next?
- Analytics — long-term trend view across many jobs (use this when two-job comparison isn’t enough)
- AI Readiness — the score system that comparison builds on
- Auto-remediation — generate the “after” job that you’ll compare against the original