Overview
Data retention determines how long the raw files you upload to ORCA stick around in S3 after analysis completes. It’s separate from analysis results — quality scores, classifications, AI Readiness scores, and reports are stored in the database and persist regardless of what you set here. The retention setting only affects the original CSV/Parquet/Excel files. Once they’re deleted, you can no longer re-run analysis from them, but every result derived from them remains in your dashboard, reports, history, and contracts forever. This separation is intentional: it lets you prove the analysis happened for compliance purposes long after the source data is gone.Retention modes
ORCA supports three modes:| Mode | Behavior | When to use |
|---|---|---|
analysis_only (default) | Files deleted ~24 hours after analysis completes | Maximum privacy posture. Default for most workspaces. |
short_term | Files retained for a configurable number of days, then deleted | When you need to re-run analysis or apply remediation more than 24 h after upload |
full_retention | Files kept indefinitely until you delete them manually | Long-term storage for trusted, non-PII datasets |
analysis_only and the project security rules explicitly forbid changing this default — it’s the privacy-safe option and has to be opted out of, not into.
Setting retention
Retention is set per-job at upload time. You can also configure an org-wide default in Settings → Data retention.Per upload (UI)
On the Upload page, expand Configuration. Set:- Retention mode — pick one of the three modes above
- Retention days — only used when mode is
short_term(1–365 days)
Per upload (API)
Org-wide default
Admins can set the org default in Settings → Data retention. Per-job settings always override the org default.How deletion works
A daily ARQ cron task (cleanup_expired_files) runs at 02:00 UTC and looks for jobs whose retention has expired:
Find expired jobs
Query for jobs where
retention_mode IN ('short_term', 'analysis_only'), file_deleted_at IS NULL, and completed_at + retention_days < now().
For analysis_only, the retention period is hardcoded to 1 day.Audit log entry
Before deleting, the worker writes a
files_expiring event to the audit log with the job ID, org ID, and acting user.Delete from S3
Each file’s S3 object is deleted. If any deletion fails, the job is skipped and retried on the next run — partial deletion is never recorded as success.
Manual deletion
You can delete files immediately, regardless of retention mode:files_deleted event.
What survives deletion
After files are deleted from S3, the following remain in your database:- Job metadata, status, completed_at
- All column classifications and confidence scores
- Quality results and issue counts per column
- AI Readiness scores and dimension breakdowns
- Generated PDF reports (already rendered to S3 as separate objects with their own retention)
- Audit log entries
- The raw CSV/Parquet/Excel file
- The ability to re-run analysis with different settings
- The ability to apply auto-remediation to that file (remediation needs the source bytes)
Compliance positioning
| Requirement | Recommended setting |
|---|---|
| GDPR data minimisation | analysis_only (default) |
| Right to erasure (Article 17) | analysis_only or short_term ≤7 days |
| SOC 2 audit trail | Any mode — audit log persists regardless |
| Long-term re-analysis on stable data | full_retention (only for non-PII datasets) |
| Reproducibility for ML training data | full_retention (consider versioning at the source instead) |
analysis_only mode is the right choice — analysis results stay forever, raw PII goes away in 24 hours.
Tips
- Keep the default unless you have a specific reason.
analysis_onlyis the privacy-safe option and the easiest to defend in a security review. - For
short_term, pick the shortest window that lets you act. If your fix-and-rerun loop is two days, retention of 3 days is plenty. - Audit any switch to
full_retention. It should only be used for non-PII datasets and the decision should be documented internally. - Rotate regularly. Combine
short_termretention with scheduled scans so each scan creates a fresh copy and the previous one ages out cleanly.
What’s next?
- Security overview — the broader compliance picture including encryption and PII handling
- Audit logs — verify when files were deleted
- Connectors — set up scheduled scans that work well with short retention windows