Skip to main content

Overview

Data lineage is the map of where your data comes from, where it goes, and how it transforms along the way. ORCA’s lineage view shows your tables, files, warehouses, and dashboards as nodes in a graph, with edges representing the relationships between them. Lineage answers two questions that nothing else in the platform can:
  • “If I change this column, what breaks downstream?” — impact analysis traces every node that depends on a given source.
  • “Where did this number come from?” — upstream traversal walks the chain back to the original raw data.

How nodes are created

ORCA’s lineage graph is populated from three sources, in order of preference:
SourceWhen it runsWhat it creates
Auto-detection from foreign keysWhen you connect a PostgreSQL data sourceOne node per table, one edge per FK constraint
Auto-detection from analyzed filesWhen a job completesOne node per file, edges to its parent data source
Manual entryAnytime, in edit modeCustom nodes (dashboards, BI tools, downstream consumers)
Auto-detection is idempotent — re-running it never creates duplicates.

Auto-detection from PostgreSQL

When you connect a PostgreSQL source, ORCA scans information_schema for foreign key constraints and creates lineage edges automatically. The detection is capped at 500 FK relationships per database to keep large schemas responsive. You can re-trigger detection at any time from the source detail page in Sources.

Working with the graph

Open Lineage from the sidebar.
1

Filter by source

Use the source dropdown in the toolbar to scope the graph to a single warehouse or data source. Useful when you have many connected systems.
2

Click a node

The detail panel shows node type, schema, database, owning data source, and the raw metadata payload.
3

Run impact analysis

From any node detail panel, click Impact analysis. ORCA traverses the graph downstream up to 10 levels deep and highlights every node that would be affected by a change to the selected one.
4

Click an edge

The edge detail panel shows the relationship type, column-level mappings (when known), description, and how the edge was discovered (auto vs manual).

Keyboard shortcuts

KeyAction
fFit the graph to the viewport
EscClose any open panel or modal

Manual nodes and edges

Some parts of your data flow live outside ORCA — Tableau dashboards, downstream microservices, ML pipelines. You can model these by hand.
1

Enable edit mode

Toggle Edit mode in the toolbar.
2

Add a node

Click Add node. Pick a node type (table, file, view, dashboard, model), give it a name, and optionally link it to a data source.
3

Add an edge

Click Add edge. Pick the source node, target node, and edge type (foreign key, derived from, transformed by, copied from). Add a description to explain the relationship.
Manually-added edges are tagged detected_by: manual so you can distinguish them from auto-detected ones.

API reference

Lineage is fully accessible via the REST API. See API endpoints for the complete schema.
# Fetch the full graph
GET /api/v1/lineage/graph

# Fetch downstream impact for a single node (max 10 levels deep)
GET /api/v1/lineage/impact/{node_id}

# Create a node (admin only)
POST /api/v1/lineage/nodes

# Create an edge (admin only)
POST /api/v1/lineage/edges
Mutating endpoints (POST, PATCH, DELETE) require admin role and are rate-limited to 20 requests per minute.

Limitations

  • Foreign-key auto-detection currently supports PostgreSQL only. BigQuery and Snowflake nodes must be added manually or through file ingestion.
  • Impact analysis depth is capped at 10 levels to prevent runaway queries on highly connected graphs.
  • Column-level lineage mappings are stored on edges but only auto-populated for FK-detected edges. Manual edges support column mappings via the API.

What’s next?