Skip to main content

What is ORCA?

ORCA is a data intelligence platform built for teams that need to understand, improve, and assess the quality of their data. Upload any dataset and ORCA automatically:
  • Classifies every column using AI-powered semantic analysis (200+ categories)
  • Detects data quality issues: nulls, duplicates, format violations, outliers, GDPR-sensitive fields
  • Scores AI readiness across 7 dimensions with actionable recommendations
  • Remediates issues with auto-fix strategies
  • Assesses datasets with verifiable quality reports

Key capabilities

CapabilityDescription
Semantic classificationAI classifies columns into 200+ semantic categories (email, revenue, date_of_birth, etc.)
Quality analysisDetects nulls, duplicates, format violations, outliers, and anomalies per column
AI readiness scoring7-dimension weighted score (0-100) measuring dataset fitness for ML/AI use cases
Auto-remediationPreview and apply fixes: null imputation, deduplication, format standardization, outlier treatment
GDPR complianceAutomatic PII detection with 3-layer password screening and data masking
Use-case readinessAssess fitness for 8 ML use cases (churn prediction, fraud detection, recommendation, etc.)
AssessmentIssue verifiable SHA-256 assessment reports for datasets scoring 75+
PDF reportsExport AI readiness and GDPR compliance reports

Architecture

ORCA is built on:
  • Backend: FastAPI + PostgreSQL + Redis + AWS S3
  • AI engine: Google Gemini for semantic classification
  • Task queue: ARQ for async processing (file analysis, report generation)
  • Frontend: React with real-time WebSocket progress updates

Next steps

Quick start

Upload your first file and see results in minutes

AI readiness

Understand the 7-dimension scoring methodology

API reference

Integrate ORCA into your data pipeline

Security

Security and compliance details