Anomaly detection

Overview

Anomaly detection answers the question “did something change in my data that shouldn’t have?” Every time ORCA scans a table or file, it computes a set of quality metrics — null rate, distinct count, mean, median, format conformance — and compares them against the historical baseline for the same metric on the same source. If a metric drifts far enough from its history, ORCA fires an alert and (when enabled) creates an explanation in the Fix Inbox. This is the fastest way to catch upstream pipeline breaks, schema migrations gone wrong, source-system bugs, and data drift — long before they corrupt downstream models or dashboards.

How it works

For every metric on every scanned table, ORCA compares the current value against a statistical baseline built from that metric’s recorded history on the same source. When the current value departs from the baseline by more than the configured tolerance, it is flagged as an anomaly. The exact statistical method, baseline window length, and per-sensitivity tolerance values are implementation details we do not publish. Sensitivity is configured per workspace through one of three named levels:

Sensitivity	Behavior
Low	Only large deviations are flagged. Fewest alerts. Best for noisy data.
Medium (default)	Conventional statistical-anomaly behavior. Recommended starting point.
High	Subtler shifts are flagged. More alerts. Best for stable, mission-critical sources.

New sources go through a warm-up period during which anomaly detection is silent — the baseline needs enough recorded history to be statistically meaningful before any flag can fire.

What gets monitored

ORCA tracks every numerical quality metric your scans produce. Common ones:

Metric	What it measures	Common anomaly cause
`null_rate`	% of nulls per column	Upstream join broken, ETL filter changed
`distinct_count`	Unique values per column	Schema enum extended, encoding changed
`mean`	Average of numeric column	Currency conversion bug, unit change
`median`	Median of numeric column	Outlier influx, business event
`row_count`	Total rows in table	Pipeline partial load, deletion bug
`format_conformance`	% matching expected pattern	Source system format change

You don’t have to enable metrics one by one — they’re computed automatically on every scan.

Correlated anomalies

When multiple metrics drift together, ORCA groups them as a correlation. For example, if null_rate on customer_email jumps and distinct_count on customer_id drops at the same time, that’s almost certainly a single root cause (probably a broken upstream join). Correlated anomalies fire a single alert instead of N separate ones, with a summary explaining the likely shared cause.

Setting sensitivity

Open Settings → Org profile and adjust Anomaly sensitivity for your workspace. The change takes effect on the next scan. You can also set sensitivity per scheduled scan:

PATCH /api/v1/sources/{source_id}/schedule
{
  "sensitivity": "high"
}

Reviewing anomalies

Anomalies surface in three places:

Alerts page — every fired anomaly creates an alert with the metric name, the current value, and a summary of how it compares to baseline
Fix Inbox — anomaly explanations are queued for review with suggested investigation steps
Job detail — when a fresh scan triggers an anomaly, it shows up inline on that job’s detail page

Each anomaly record includes:

The metric name and current value
A summary of the historical baseline the current value was compared against
The sensitivity level that was active when the flag fired
A natural-language explanation generated by the anomaly explainer

API

# List anomalies for a source
GET /api/v1/sources/{source_id}/anomalies

# Get the historical baseline for a metric
GET /api/v1/sources/{source_id}/metrics/{metric_name}/history

Tips

Don’t start at high sensitivity. Run on medium for two weeks to let the baseline stabilize, then tighten if you’re getting false negatives.
Acknowledge confirmed anomalies. Acknowledging an alert tells the system the deviation is expected (e.g. Black Friday traffic spike) so it doesn’t refire on similar future events.
Pair with contracts. Contracts enforce hard limits (“null rate must be under 5%”). Anomaly detection catches drift within those limits (“null rate jumped from 1% to 4%, still legal but unusual”).
Monitor freshness. Combine anomaly detection with the freshness checker to catch sources that stop updating entirely.

What’s next?

Alerts — configure how anomaly notifications reach you (email, Slack, Teams, webhook)
Data contracts — set hard quality SLAs alongside the statistical baselines
Fix Inbox — review anomaly explanations and follow-up actions

Getting started

Features

Administration

Integrations

Security & compliance

Developer Tools

Methodology

Overview

How it works

What gets monitored

Correlated anomalies

Setting sensitivity

Reviewing anomalies

API

Tips

What’s next?

​Overview

​How it works

​What gets monitored

​Correlated anomalies

​Setting sensitivity

​Reviewing anomalies

​API

​Tips

​What’s next?

Overview

How it works

What gets monitored

Correlated anomalies

Setting sensitivity

Reviewing anomalies

API

Tips

What’s next?