Overview
ORCA can connect to external data sources for continuous quality monitoring. Once connected, ORCA discovers files and tables, runs scheduled scans, and tracks changes over time. You can connect sources through the web UI or the API.Supported sources
| Source | Type | Auth method | File formats |
|---|---|---|---|
| AWS S3 | Object storage | IAM role or access key | CSV, Parquet, JSON |
| Google Cloud Storage | Object storage | Service account (Workload Identity) | CSV, Parquet, JSON |
| PostgreSQL | Database | Connection string | — |
| BigQuery | Data warehouse | Service account | — |
| Snowflake | Data warehouse | Username/password or key pair | — |
Connecting a source
AWS S3
Required credentials:| Field | Description |
|---|---|
| Bucket name | S3 bucket name |
| Prefix | Optional path prefix to scope file discovery |
| AWS region | Bucket region (default: eu-north-1) |
| Access key ID | IAM access key (or use IAM role) |
| Secret access key | IAM secret key (or use IAM role) |
scan_csv / scan_parquet) for memory-safe processing of large datasets. KMS encryption is supported for uploads.
Google Cloud Storage
Required credentials:| Field | Description |
|---|---|
| Bucket name | GCS bucket name |
| Prefix | Optional path prefix |
| Service account JSON | Service account key file contents |
GOOGLE_APPLICATION_CREDENTIALS. Workload Identity Federation is supported for production deployments without key files.
PostgreSQL
Required credentials:| Field | Description |
|---|---|
| Host | Database hostname |
| Port | Database port (default: 5432) |
| Database | Database name |
| Username | Database user |
| Password | Database password |
| Schema | Schema to scan (default: public) |
| SSL mode | Connection SSL mode |
BigQuery
Required credentials:| Field | Description |
|---|---|
| Project ID | GCP project ID |
| Dataset | BigQuery dataset name |
| Service account JSON | Service account key file contents |
Snowflake
Required credentials:| Field | Description |
|---|---|
| Account | Snowflake account identifier |
| Warehouse | Compute warehouse |
| Database | Database name |
| Schema | Schema name |
| Username | Snowflake user |
| Password | Snowflake password |
Testing a connection
Before saving a source, test credentials to verify access:Creating a source
Scan scheduling
Set up cron-based schedules to scan sources automatically.Schedule configuration
| Field | Description |
|---|---|
cron_expression | Standard cron expression (e.g. 0 6 * * * for daily at 06:00 UTC) |
tables | Optional list of specific tables to scan (default: all) |
enabled | Toggle the schedule on/off |
Common cron patterns
| Schedule | Cron expression |
|---|---|
| Every hour | 0 * * * * |
| Daily at 06:00 UTC | 0 6 * * * |
| Weekdays at 08:00 UTC | 0 8 * * 1-5 |
| Weekly on Monday | 0 6 * * 1 |
Creating a schedule via API
File discovery and sync state
For object storage sources (S3, GCS), ORCA maintains a sync state for each discovered file:- New files are detected on each scan and queued for analysis
- Modified files (changed
Last-ModifiedorETag) are re-scanned - Deleted files are marked as removed in the sync state
source_file_states table. You can view discovered files and their scan status through the Sources page in the web app or via the API.
API endpoints
All source management endpoints require authentication and org membership.| Method | Endpoint | Description |
|---|---|---|
POST | /api/v1/sources/test-connection | Test credentials without saving |
POST | /api/v1/sources | Create a new data source |
GET | /api/v1/sources | List all sources for the org |
GET | /api/v1/sources/:id | Get source details |
PATCH | /api/v1/sources/:id | Update source config or credentials |
DELETE | /api/v1/sources/:id | Delete a source (admin only) |
POST | /api/v1/sources/:id/scan | Trigger a manual scan |
GET | /api/v1/sources/:id/files | List discovered files and sync state |
POST | /api/v1/sources/:id/schedules | Create a scan schedule |
GET | /api/v1/sources/:id/schedules | List scan schedules |
PATCH | /api/v1/sources/:id/schedules/:sid | Update a schedule |
DELETE | /api/v1/sources/:id/schedules/:sid | Delete a schedule |
Security
- Credentials are encrypted at rest using the application secret key
- Credentials are never included in API responses or logs
- Source creation and deletion events are recorded in the audit log
- All source queries are scoped to the authenticated user’s organisation
- S3 keys are scoped to the org’s prefix to prevent cross-tenant access