Data lineage

Overview

Data lineage is the map of where your data comes from, where it goes, and how it transforms along the way. ORCA’s lineage view shows your tables, files, warehouses, and dashboards as nodes in a graph, with edges representing the relationships between them. Lineage answers two questions that nothing else in the platform can:

“If I change this column, what breaks downstream?” — impact analysis traces every node that depends on a given source.
“Where did this number come from?” — upstream traversal walks the chain back to the original raw data.

How nodes are created

ORCA’s lineage graph is populated from three sources, in order of preference:

Source	When it runs	What it creates
Auto-detection from foreign keys	When you connect a PostgreSQL data source	One node per table, one edge per FK constraint
Auto-detection from analyzed files	When a job completes	One node per file, edges to its parent data source
Manual entry	Anytime, in edit mode	Custom nodes (dashboards, BI tools, downstream consumers)

Auto-detection is idempotent — re-running it never creates duplicates.

Auto-detection from PostgreSQL

When you connect a PostgreSQL source, ORCA scans information_schema for foreign key constraints and creates lineage edges automatically. The detection is capped at 500 FK relationships per database to keep large schemas responsive. You can re-trigger detection at any time from the source detail page in Sources.

Working with the graph

Open Lineage from the sidebar.

Filter by source

Use the source dropdown in the toolbar to scope the graph to a single warehouse or data source. Useful when you have many connected systems.

Click a node

The detail panel shows node type, schema, database, owning data source, and the raw metadata payload.

Run impact analysis

From any node detail panel, click Impact analysis. ORCA traverses the graph downstream up to 10 levels deep and highlights every node that would be affected by a change to the selected one.

Click an edge

The edge detail panel shows the relationship type, column-level mappings (when known), description, and how the edge was discovered (auto vs manual).

Keyboard shortcuts

Key	Action
`f`	Fit the graph to the viewport
`Esc`	Close any open panel or modal

Manual nodes and edges

Some parts of your data flow live outside ORCA — Tableau dashboards, downstream microservices, ML pipelines. You can model these by hand.

Enable edit mode

Toggle Edit mode in the toolbar.

Add a node

Click Add node. Pick a node type (table, file, view, dashboard, model), give it a name, and optionally link it to a data source.

Add an edge

Click Add edge. Pick the source node, target node, and edge type (foreign key, derived from, transformed by, copied from). Add a description to explain the relationship.

Manually-added edges are tagged detected_by: manual so you can distinguish them from auto-detected ones.

API reference

Lineage is fully accessible via the REST API. See API endpoints for the complete schema.

# Fetch the full graph
GET /api/v1/lineage/graph

# Fetch downstream impact for a single node (max 10 levels deep)
GET /api/v1/lineage/impact/{node_id}

# Create a node (admin only)
POST /api/v1/lineage/nodes

# Create an edge (admin only)
POST /api/v1/lineage/edges

Mutating endpoints (POST, PATCH, DELETE) require admin role and are rate-limited to 20 requests per minute.

Limitations

Foreign-key auto-detection currently supports PostgreSQL only. BigQuery and Snowflake nodes must be added manually or through file ingestion.
Impact analysis depth is capped at 10 levels to prevent runaway queries on highly connected graphs.
Column-level lineage mappings are stored on edges but only auto-populated for FK-detected edges. Manual edges support column mappings via the API.

What’s next?

Connect a PostgreSQL source to auto-populate your graph
Pair lineage with data contracts to understand which contracts protect upstream nodes
Use knowledge graph for entity-level (rather than table-level) relationships

Getting started

Features

Administration

Integrations

Security & compliance

Developer Tools

Methodology

Overview

How nodes are created

Auto-detection from PostgreSQL

Working with the graph

Keyboard shortcuts

Manual nodes and edges

API reference

Limitations

What’s next?

​Overview

​How nodes are created

​Auto-detection from PostgreSQL

​Working with the graph

​Keyboard shortcuts

​Manual nodes and edges

​API reference

​Limitations

​What’s next?

Overview

How nodes are created

Auto-detection from PostgreSQL

Working with the graph

Keyboard shortcuts

Manual nodes and edges

API reference

Limitations

What’s next?