Skip to main content

Overview

ORCA’s AI chat lets you ask questions about your data quality, classification results, and readiness scores in plain language. Instead of navigating dashboards and filtering tables, you can ask directly: “Which columns have the most null values?” or “Why did the readiness score drop?” The chat understands your organisation’s data context — it knows your files, columns, quality issues, and historical trends.

What you can ask

Simple questions (1 token)

Factual lookups and status checks answered from your data:
  • “How many files have been scanned?”
  • “What’s the quality score for customers.csv?”
  • “List all columns with GDPR flags”
  • “Show me the latest job status”

Complex questions (3 tokens)

Questions that require reasoning, comparison, or diagnosis:
  • “Why does the revenue column have so many outliers?”
  • “Compare quality scores between last month and this month”
  • “Which files should I prioritize for remediation?”
  • “Explain the relationship between these two tables”
  • “What’s causing the readiness score to drop?”

Action requests (5 tokens)

Requests that trigger operations:
  • “Generate a GDPR compliance report”
  • “Fix the null values in the email column”
  • “Create a data contract for the orders table”
Action requests include a confirmation step before executing.

Query routing

Every query is automatically classified to determine the right model and token cost:
Query typeModelToken costUse case
SimpleGemini Flash1 tokenLookups, status checks, listing data
ComplexClaude Sonnet3 tokensReasoning, diagnosis, strategy, comparison
ActionClaude Sonnet5 tokensFix, generate, apply operations
Follow-upGemini Flash1 tokenContinuing a conversation thread
Off-scope0 tokensRejected (not about data quality)
Routing is handled by a Gemini Flash classification call with a keyword-based fallback if the classifier is unavailable.

SQL-of-Thought reasoning

For complex queries, ORCA uses structured multi-step reasoning to build accurate answers:
  1. Parse the question and identify what data is needed
  2. Assemble context from quality results, classifications, and historical scores
  3. Reason through the evidence step by step
  4. Synthesize a clear, actionable answer
This approach produces more accurate and grounded responses than single-shot prompting, especially for questions that span multiple files or require trend analysis.

Contextual questions

Throughout the web app, you’ll find “Ask About This” buttons on quality scores, dimension breakdowns, anomaly alerts, and column details. These buttons pre-fill the chat with relevant context, so the AI already knows what you’re looking at. For example, clicking “Ask About This” on a low completeness score sends a query like:
“The completeness dimension scored 62%. What columns are driving this down and how can I improve it?”
The AI receives the dimension context, the file’s quality results, and your organisation’s profile — so the answer is specific to your situation, not generic advice.

Token costs

Chat queries consume tokens from your monthly bucket:
PlanMonthly tokensSimple queriesComplex queriesAction queries
Free50501610
Pro5,0005,0001,6661,000
Enterprise25,00025,0008,3335,000
Token counts shown are maximum if you only used that query type. In practice, you’ll use a mix. If your balance is insufficient for a query, the chat will let you know before deducting tokens. If an AI call fails after deduction, tokens are automatically refunded.

Tips for better answers

Instead of “How’s my data quality?”, ask “What are the top 3 quality issues in customers_march.csv?” — the AI can give you a concrete, actionable answer.
“Why does the annual_revenue column have outliers?” gets a better response than “Why are there outliers?” because the AI can look up the specific column’s statistics and value distribution.
The chat maintains conversation history (last 10 messages). After getting an answer, ask follow-up questions to drill deeper: “Which specific rows are affected?” or “What would happen if I applied winsorization?”
The “Ask About This” buttons throughout the app provide the AI with structured context that produces more targeted answers than free-form questions.
The chat is scoped to data quality topics. Questions outside this scope (general knowledge, coding help, unrelated topics) are politely declined to keep token costs at zero for off-scope queries.

Chat via the API

You can also use the chat programmatically:

Create a conversation

POST /api/v1/chat/conversations
Content-Type: application/json
Authorization: Bearer <token>

{
  "title": "Q1 data review"
}

Send a message

POST /api/v1/chat/conversations/{conversation_id}/messages
Content-Type: application/json
Authorization: Bearer <token>

{
  "content": "What are the top quality issues across all my files?",
  "context": null
}
The response includes the AI’s answer, the query type classification, and the token cost:
{
  "data": {
    "id": "uuid",
    "role": "assistant",
    "content": "Based on your latest scans, the top 3 issues are...",
    "query_type": "complex",
    "token_cost": 3,
    "created_at": "2026-03-30T14:22:00Z"
  }
}

List conversations

GET /api/v1/chat/conversations
Authorization: Bearer <token>

Security

All chat inputs pass through a security gate that sanitizes queries before they reach the AI model. The system:
  • Scrubs any PII from user messages before logging
  • Validates AI responses for safety
  • Rejects prompt injection attempts
  • Rate-limits queries per user
Chat data is scoped to your organisation and never shared across tenants.

Next steps

Classification

Learn how ORCA classifies your columns

AI readiness

Understand the scoring methodology the chat references