Overview
Anomaly detection answers the question “did something change in my data that shouldn’t have?” Every time ORCA scans a table or file, it computes a set of quality metrics — null rate, distinct count, mean, median, format conformance — and compares them against the historical baseline for the same metric on the same source. If a metric drifts far enough from its history, ORCA fires an alert and (when enabled) creates an explanation in the Fix Inbox. This is the fastest way to catch upstream pipeline breaks, schema migrations gone wrong, source-system bugs, and data drift — long before they corrupt downstream models or dashboards.How it works
For every metric on every scanned table, ORCA compares the current value against a statistical baseline built from that metric’s recorded history on the same source. When the current value departs from the baseline by more than the configured tolerance, it is flagged as an anomaly. The exact statistical method, baseline window length, and per-sensitivity tolerance values are implementation details we do not publish. Sensitivity is configured per workspace through one of three named levels:| Sensitivity | Behavior |
|---|---|
| Low | Only large deviations are flagged. Fewest alerts. Best for noisy data. |
| Medium (default) | Conventional statistical-anomaly behavior. Recommended starting point. |
| High | Subtler shifts are flagged. More alerts. Best for stable, mission-critical sources. |
What gets monitored
ORCA tracks every numerical quality metric your scans produce. Common ones:| Metric | What it measures | Common anomaly cause |
|---|---|---|
null_rate | % of nulls per column | Upstream join broken, ETL filter changed |
distinct_count | Unique values per column | Schema enum extended, encoding changed |
mean | Average of numeric column | Currency conversion bug, unit change |
median | Median of numeric column | Outlier influx, business event |
row_count | Total rows in table | Pipeline partial load, deletion bug |
format_conformance | % matching expected pattern | Source system format change |
Correlated anomalies
When multiple metrics drift together, ORCA groups them as a correlation. For example, ifnull_rate on customer_email jumps and distinct_count on customer_id drops at the same time, that’s almost certainly a single root cause (probably a broken upstream join).
Correlated anomalies fire a single alert instead of N separate ones, with a summary explaining the likely shared cause.
Setting sensitivity
Open Settings → Org profile and adjust Anomaly sensitivity for your workspace. The change takes effect on the next scan. You can also set sensitivity per scheduled scan:Reviewing anomalies
Anomalies surface in three places:- Alerts page — every fired anomaly creates an alert with the metric name, the current value, and a summary of how it compares to baseline
- Fix Inbox — anomaly explanations are queued for review with suggested investigation steps
- Job detail — when a fresh scan triggers an anomaly, it shows up inline on that job’s detail page
- The metric name and current value
- A summary of the historical baseline the current value was compared against
- The sensitivity level that was active when the flag fired
- A natural-language explanation generated by the anomaly explainer
API
Tips
- Don’t start at high sensitivity. Run on
mediumfor two weeks to let the baseline stabilize, then tighten if you’re getting false negatives. - Acknowledge confirmed anomalies. Acknowledging an alert tells the system the deviation is expected (e.g. Black Friday traffic spike) so it doesn’t refire on similar future events.
- Pair with contracts. Contracts enforce hard limits (“null rate must be under 5%”). Anomaly detection catches drift within those limits (“null rate jumped from 1% to 4%, still legal but unusual”).
- Monitor freshness. Combine anomaly detection with the freshness checker to catch sources that stop updating entirely.
What’s next?
- Alerts — configure how anomaly notifications reach you (email, Slack, Teams, webhook)
- Data contracts — set hard quality SLAs alongside the statistical baselines
- Fix Inbox — review anomaly explanations and follow-up actions