Skip to main content

Overview

The overall AI Readiness score tells you whether your data is generally fit for ML. Use-case readiness goes one level deeper: it tells you whether your data is fit for a specific ML use case, with a list of exactly what’s blocking you and what to fix. It answers questions like:
  • “Is my customer table ready for churn prediction?”
  • “Can I use this transactions file for fraud detection?”
  • “Do I have enough history for time-series forecasting?”
For each supported use case, ORCA scores your dataset against a profile of requirements (row count, null rate, completeness, required column types) and returns:
  • A readiness percentage (0–100%)
  • A list of blockers — hard requirements that fail
  • A list of warnings — soft requirements that don’t fail outright but should be improved
  • A natural-language summary of what to do next

Supported use cases

ORCA ships with eight predefined use-case profiles:
Use casePredictsWhat the profile looks for
ClassificationCategorical outcomes (spam, churn, fraud)Sufficient rows and columns, low average null rate
RegressionContinuous values (price, revenue, duration)Numeric columns with low null rate and meaningful spread
Churn predictionWhich customers will leaveIdentifier and datetime columns, plus enough historical depth
Time series forecastingFuture values from historical patternsDatetime column, enough historical points, stable distribution
NLP / text classificationDocument sentiment, topic, intentText columns with sufficient length and variety
Anomaly detectionOutlier detectionStable baseline data
Clustering / segmentationNatural groupingsNumeric features and low null rate
Recommendation systemItem recommendationsUser-item interaction history
Each profile encodes the kind of data quality and shape needed to train a viable model for that use case. The exact row-count minima, null-rate ceilings, and dimension-score thresholds each profile applies are tuned over time and are not published.

How scoring works

For each use case, ORCA evaluates your dataset against the profile’s requirements:
1

Hard requirements (blockers)

Things that must be true. If a use case needs a minimum number of rows and your file falls short, that’s a blocker. Each blocker reduces readiness substantially.
2

Soft requirements (warnings)

Things that should be true. If recommended categories include datetime and your file has none, that’s a warning. Warnings reduce readiness but don’t block it.
3

Required column types

Some use cases need specific semantic types. Churn prediction needs an identifier and a datetime column. Time series needs a datetime. If they’re missing, that’s a blocker.
4

Quality dimension thresholds

Each use case sets a target on the 7 readiness dimensions — for example, a churn-prediction profile cares more about completeness than a clustering profile does. Falling short on a dimension lowers readiness for that specific use case.
The final percentage is computed from how many of the profile’s requirements pass, weighted by how critical each requirement is for that use case.

Reading a result

A use-case readiness result looks like this:
{
  "usecase": "churn_prediction",
  "display_name": "Churn Prediction",
  "readiness_pct": 72,
  "blockers": [
    "Dataset is below the minimum row count for churn_prediction"
  ],
  "warnings": [
    "Average null rate is above the level recommended for this use case",
    "No 'boolean' columns detected — useful for engagement flags"
  ],
  "summary": "Your data is mostly ready for churn prediction, but you need more historical rows and slightly cleaner null rates before training."
}
The summary is the action item. The blockers tell you what to fix first. The warnings tell you what to improve once blockers are cleared. Each blocker and warning that the API returns at runtime includes the concrete numbers for your dataset against your profile, so you always see the exact gap to close — what’s not published is the universal ruleset that produces those gaps.

Where to find it

In the web app:
  • AI Readiness page → click any file → scroll to the Use-case readiness matrix
The matrix shows all 8 use cases side by side with color-coded readiness, so you can see at a glance which workloads your dataset can support today. In the API:
# All use cases for a single file
GET /api/v1/files/{file_id}/usecase-readiness

# A specific use case
GET /api/v1/files/{file_id}/usecase-readiness/churn_prediction

# List the available use case profiles
GET /api/v1/usecase-readiness/available

Why this matters

Most data quality tools tell you that your data has problems. They don’t tell you which problems matter for what you’re actually trying to do. A 5% null rate on phone_number is a dealbreaker for an SMS marketing model and irrelevant for a price forecasting model. Use-case readiness encodes that asymmetry. When a stakeholder asks “can we build a churn model with this data?” you can answer with a number and a list of fixes — not a hand-wave.

Tips

  • Check use-case readiness before the data science project starts. If churn prediction reports 30% readiness with 12 blockers, that’s a planning conversation, not a Sprint 14 surprise.
  • Use blockers as a backlog. Each blocker is one ticket. Fix them in order, re-run, watch the percentage climb.
  • Combine with contracts. Once you reach 100% on a use case, define a contract that enforces those exact requirements so future data drift doesn’t silently break your model.

What’s next?

  • AI Readiness — the overall 7-dimension score this builds on
  • Auto-remediation — fix the blockers automatically when possible
  • Reports — generate a PDF of use-case readiness for stakeholders