Skip to main content

Overview

dbt-orca is a dbt package that lets you trigger ORCA quality scans directly from your dbt workflows. Use it as a custom test on any model to enforce quality thresholds, or call it as a post-hook to scan tables after they build. Supports dbt 1.5 through 1.x with adapter-specific implementations for Snowflake, PostgreSQL, and BigQuery.

Installation

Add dbt-orca to your packages.yml:
packages:
  - git: "https://github.com/klavest/dbt-orca.git"
    revision: v0.1.0
Then install:
dbt deps

Environment variables

Set these before running dbt:
export ORCA_API_URL="https://api.orca-klavest.app"
export ORCA_API_KEY="orca_key_abc123..."

Configuration

Add optional configuration to your dbt_project.yml:
vars:
  orca_min_score: 80              # Minimum quality score to pass (default: 80)
  orca_poll_retries: 5            # Max status poll attempts (default: 5)
  orca_poll_interval_seconds: 10  # Seconds between polls (default: 10)

Using the orca_quality test

Add the test to any model in your schema.yml:
models:
  - name: orders
    tests:
      - orca.orca_quality:
          source_id: "src_abc123"
          table_name: "orders"     # optional, defaults to model name
          min_score: 85            # optional, defaults to orca_min_score var
When you run dbt test, the package will:
  1. Trigger an ORCA quality scan via the API
  2. Poll for completion (up to orca_poll_retries attempts)
  3. Check the quality score against the threshold
  4. Pass the test if score >= threshold, fail with a clear error message if not

Standalone test file

You can also write a standalone test in tests/:
-- tests/test_orders_quality.sql
{{ orca.test_orca_quality(
    source_id="src_abc123",
    table_name="orders",
    min_score=90
) }}

Run via dbt run-operation

For ad-hoc scans without running the full test suite:
dbt run-operation orca_quality_scan \
  --args '{source_id: "src_abc123", table_name: "orders", min_score: 85}'

Adapter setup

The package uses adapter.dispatch to route API calls through your database’s HTTP capabilities. Out of the box, three adapters are supported.

PostgreSQL

Requires the http extension:
-- Run as superuser
CREATE EXTENSION IF NOT EXISTS http;
GRANT USAGE ON SCHEMA extensions TO your_dbt_role;
The PostgreSQL adapter uses http_post and http_get to call the ORCA API, and pg_sleep for polling intervals.

Snowflake

Requires an API integration and external function:
-- 1. Create an API integration
CREATE OR REPLACE API INTEGRATION orca_api_integration
  API_PROVIDER = aws_api_gateway
  API_ALLOWED_PREFIXES = ('https://api.orca-klavest.app')
  ENABLED = true;

-- 2. Create external functions for HTTP calls
CREATE OR REPLACE EXTERNAL FUNCTION orca_http_post(url VARCHAR, headers VARIANT, body VARCHAR)
  RETURNS VARIANT
  API_INTEGRATION = orca_api_integration
  AS '<your-gateway-url>';

CREATE OR REPLACE EXTERNAL FUNCTION orca_http_get(url VARCHAR, headers VARIANT)
  RETURNS VARIANT
  API_INTEGRATION = orca_api_integration
  AS '<your-gateway-url>';
The Snowflake adapter uses SYSTEM$WAIT for polling intervals.

BigQuery

Create a BigQuery remote function or Cloud Function that proxies HTTP requests, then override the macro in your project:
-- macros/bigquery__orca_scan.sql
{% macro bigquery__orca_scan(source_id, table_name) %}
  -- Your BigQuery-specific HTTP call implementation
{% endmacro %}

Default adapter (no native HTTP)

If your adapter does not support HTTP calls natively, the default implementation logs the equivalent curl command so you can execute the scan manually or integrate through other means:
======================================================================
ORCA SCAN REQUEST (default adapter -- no native HTTP support)
======================================================================

To execute this scan manually, run:

  curl -X POST https://api.orca-klavest.app/api/v1/public/scan
    -H 'Authorization: Bearer <ORCA_API_KEY>'
    -H 'Content-Type: application/json'
    -d '{"source_id": "src_abc123"}'

For automated HTTP calls, use an adapter-specific override.
======================================================================

Post-hook auto-scanning

Trigger an ORCA scan automatically after a model builds by adding a post-hook:
# dbt_project.yml
models:
  my_project:
    +post-hook:
      - "{{ orca.orca_quality_scan(source_id='src_abc123', min_score=80) }}"
Or on a specific model in schema.yml:
models:
  - name: orders
    config:
      post_hook:
        - "{{ orca.orca_quality_scan(source_id='src_abc123', table_name='orders') }}"

Macros reference

MacroPurpose
orca.test_orca_qualityCustom dbt test — triggers scan, polls, and validates score
orca.orca_quality_scanConvenience macro for run-operation or post-hook usage
orca.orca_scanLow-level macro — triggers scan and returns job_id
orca.orca_checkLow-level macro — polls scan status and validates score

Error handling

The package raises CompilationError with descriptive messages when:
  • ORCA_API_URL or ORCA_API_KEY is not set
  • The scan fails (API error or processing failure)
  • The quality score is below the configured threshold
  • Polling exhausts all retries without completion
Example failure output:
Compilation Error in test orca_quality_orders
  ORCA quality check FAILED. Score: 68 is below minimum threshold of 80.
  Job ID: j1a2b3c4-5678-9abc-def0-1234567890ab