What we measure
The score is built from seven dimensions. One sentence per dimension, plain English, focused on what each dimension’s inputs are.- Completeness — how much of your data is actually present, by looking at missing-value patterns across rows and columns.
- Consistency — how uniformly your data follows the formats and validation rules expected for each column type.
- Referential integrity — whether cross-column references resolve (no orphans pointing at non-existent records) and whether values conform structurally to the format their column is supposed to carry. This dimension explicitly does not measure correctness against an external ground-truth value — see “What we deliberately don’t measure” below for the standards rationale.
- Compliance — whether sensitive personal data carries the legal basis it needs and is properly remediated for GDPR purposes.
- Uniqueness — whether duplicate records distort the training distribution.
- Schema quality — whether your data structure is stable, well-typed, and confidently classified by the platform’s column-type detection.
- Stability — whether your data drifts over time in ways that break models. Currently being re-implemented; see Limits below.
What we deliberately don’t measure (yet)
Standards-aligned data-quality work touches several signals that ORCA does not currently score. We list them here rather than imply coverage we don’t have.- Representativeness across protected classes. NIST AI RMF MAP 2.3 and MEASURE 2.2 expect evaluation datasets to be demographically representative of deployment populations. ORCA does not currently cross-tabulate any dimension by a protected class, in part because doing so would require keeping the very PII signals our compliance dimension asks customers to remediate. A separable fairness mode is on the roadmap.
- Statistical drift measures. NIST MEASURE 2.4 expects distributional drift to be quantified using measures such as Kolmogorov–Smirnov distance, Population Stability Index, Jensen–Shannon divergence, or Wasserstein distance. The stability dimension today consumes alert counts; the statistical-drift implementation is in flight.
- Ground-truth value correctness. ISO/IEC 25012 reserves the term accuracy for correctness of values against a true reference value. Customers do not generally supply ground-truth references alongside production data, so we measure referential integrity instead — and the dimension is named accordingly. The score will catch a customer_id pointing at a deleted customer; it will not catch a country field that says “Atlantis” instead of “Sweden”, because there is no external reference to compare against.
- Re-identification and k-anonymity scoring. Privacy-aligned data quality goes beyond “is there PII?” into “could individuals be re-identified after masking?” ORCA detects PII but does not score k-anonymity, l-diversity, or membership-inference exposure.
- Lineage and provenance documentation depth. NIST MAP 2.3 and GOVERN 1.4 expect documented data lineage covering collection, labelling, cleaning, and transformation history. ORCA has a lineage subsystem that is not yet incorporated into the AI Readiness composite.
The standards we align with
We anchor the methodology in three external references. The first is free and primary; the others are paywalled and cited from publicly available material.NIST AI Risk Management Framework v1.0
NIST AI 100-1, published January 2023, is the most-cited AI governance framework in policy discussions today. It is in the public domain and freely available from NIST. Within its Map and Measure functions, the subcategories most directly relevant to data quality are:- MAP 2.3 — training data quality, lineage, and treatment of missing, spurious, or outlier data.
- MEASURE 2.2 — representativeness of evaluation datasets relative to deployment populations.
- MEASURE 2.4 — production drift, including shifts in input data distribution and other indicators of changing risk.
- MEASURE 2.10 — privacy risk of the AI system.
- MEASURE 2.11 — fairness and bias.
ISO/IEC 25012:2008
ISO/IEC 25012 is the long-standing data-quality model standard. It defines fifteen characteristics — five inherent, seven inherent + system-dependent, three system-dependent. ORCA’s completeness, consistency, uniqueness, and compliance dimensions map to the equivalent characteristics in this standard. Our referential integrity dimension maps to the standard’s credibility characteristic — the degree to which data is regarded as true and believable by users — rather than to the standard’s accuracy characteristic, which would require comparison against a true reference value we do not have.ISO/IEC 5259:2024–2025
ISO/IEC 5259 is the newer, AI/ML-specific data-quality standard, published in five parts during 2024 and 2025. ORCA’s seven dimensions correspond to the measurement approaches defined across Parts 2 (measures), 3 (management), 4 (process), and 5 (governance) of this standard.Citation note. ISO/IEC 25012 and ISO/IEC 5259 are paywalled. We cite them at the inherent-characteristic level (25012) and at the part-structure level (5259) based on publicly available abstracts on iso.org and on independent academic and industry summaries. Verbatim subclause text was not consulted for the published version of this methodology. Any auditor wishing to verify can purchase the relevant part from a national standards body — SIS for Sweden, ANSI for the United States, BSI for the United Kingdom, or ISO directly. NIST AI RMF v1.0 is in the public domain and verifiable without purchase.
How the score is built
The score is a weighted average across the seven dimensions. The weights reflect how heavily each dimension matters for AI/ML workloads — for example, completeness and compliance carry more weight than uniqueness, because missing values and unhandled PII are more catastrophic for AI training than duplicate records. The exact weight values, the thresholds each dimension applies internally, and the per-issue penalty schedules are implementation details we do not publish. This page describes what we measure and which standards we map to; it does not provide the recipe that produces the score. The weights are not fixed across all use cases. ORCA carries a small set of named profiles — for example, classification, regression, NLP, time-series, and anomaly-detection — that re-balance the weights to reflect what matters most for that family of model. A time-series profile gives more weight to stability than a classification profile does, because temporal drift dominates time-series risk. A profile is selected by the customer when generating a report. The composite is clamped to the 0–100 range and rounded to one decimal. Letter grades are assigned by score band: A at 90 and above, B at 75 and above, C at 60 and above, D at 40 and above, F below 40.Limits and assumptions
Three honest limits readers should know before relying on the score.- The score is a portfolio measure, not a guarantee of model performance. It tells you whether your data is the kind of input AI/ML systems generally tolerate well. It does not predict accuracy on any specific use case or model. Forthcoming external benchmark validation (Tier 2 of our methodology programme) will publish correlations against open ML-dataset performance deltas, separately from the score itself.
- The score is dependent on what the platform can observe at scan time. A column never sampled, a file never re-scanned, or a data source whose schema changed silently between scans will all reduce the informativeness of the dimensions that read from those signals.
- The score is methodology-versioned. Every report cites the methodology version that produced it. Changes to weights, thresholds, or dimension definitions trigger a version bump. Pure implementation bug-fixes do not. Historical reports therefore remain interpretable under the methodology that produced them.
- The stability dimension was previously inert in production due to a schema-level query bug; it has been re-implemented to consume the platform’s drift detector and now reflects scan-to-scan change.
- The dimension previously labelled accuracy has been renamed to referential integrity to reflect what it actually measures (orphan-reference rate plus severity of format violations) rather than the ISO/IEC 25012 sense of accuracy (correctness against a true reference value, which we do not measure).
Versioning
This page is methodology v1.0, published 2026-04-27. v1.0 supersedes v0.9 once both of the in-flight revisions noted in v0.9 landed:- the stability dimension is wired to the platform’s drift detector and now reflects real scan-to-scan change rather than a constant; and
- the dimension previously labelled accuracy has been renamed to referential integrity, aligning the dimension’s name with what it actually measures.
How to verify our claims
ORCA is committed to verifiable methodology. Three pathways are available today, with two more in flight.- Standards verification. All three standards we cite are independently verifiable. NIST AI RMF v1.0 is in the public domain at nist.gov. ISO/IEC 25012:2008 and ISO/IEC 5259:2024–2025 are available for purchase from any national standards body. If an auditor requires verbatim subclause text from either ISO standard, the relevant part can be purchased and checked against the dimension mapping above.
- Methodology-version audit trail. Every AI Readiness report carries the methodology version under which it was generated. Customers who retain reports across versions can trace exactly which methodology produced any historical score.
- Dimension transparency. Per-file reports break the composite down into the seven dimensions and surface the issues each dimension observed. The score is never opaque at the dimension level even though the per-dimension formulas remain internal.
- Open benchmark study. We are running an external validation exercise correlating AI Readiness scores with model-performance deltas across well-known public ML datasets — UCI, OpenML, and Kaggle past competitions. Results will be published as a separate report so the correlation claim is independently checkable without exposing internal scoring code.
- Synthetic-data degradation harness. A reproducible test harness customers can run to verify the score moves as expected under controlled stress (injected nulls, format violations, duplicates, PII without consent flags). The internal version of this harness is what backs the math audit feeding this page; the customer-facing version will follow.