AI chat

Overview

ORCA’s AI chat lets you ask questions about your data quality, classification results, and readiness scores in plain language. Instead of navigating dashboards and filtering tables, you can ask directly: “Which columns have the most null values?” or “Why did the readiness score drop?” The chat understands your organisation’s data context — it knows your files, columns, quality issues, and historical trends.

What you can ask

Simple questions (1 token)

Factual lookups and status checks answered from your data:

“How many files have been scanned?”
“What’s the quality score for customers.csv?”
“List all columns with GDPR flags”
“Show me the latest job status”

Complex questions (3 tokens)

Questions that require reasoning, comparison, or diagnosis:

“Why does the revenue column have so many outliers?”
“Compare quality scores between last month and this month”
“Which files should I prioritize for remediation?”
“Explain the relationship between these two tables”
“What’s causing the readiness score to drop?”

Action requests (5 tokens)

Requests that trigger operations:

“Generate a GDPR compliance report”
“Fix the null values in the email column”
“Create a data contract for the orders table”

Action requests include a confirmation step before executing.

Query routing

Every query is automatically classified to determine the right model and token cost:

Query type	Model	Token cost	Use case
Simple	Gemini Flash	1 token	Lookups, status checks, listing data
Complex	Claude Sonnet	3 tokens	Reasoning, diagnosis, strategy, comparison
Action	Claude Sonnet	5 tokens	Fix, generate, apply operations
Follow-up	Gemini Flash	1 token	Continuing a conversation thread
Off-scope	—	0 tokens	Rejected (not about data quality)

Routing is handled by a Gemini Flash classification call with a keyword-based fallback if the classifier is unavailable.

SQL-of-Thought reasoning

For complex queries, ORCA uses structured multi-step reasoning to build accurate answers:

Parse the question and identify what data is needed
Assemble context from quality results, classifications, and historical scores
Reason through the evidence step by step
Synthesize a clear, actionable answer

This approach produces more accurate and grounded responses than single-shot prompting, especially for questions that span multiple files or require trend analysis.

Contextual questions

Throughout the web app, you’ll find “Ask About This” buttons on quality scores, dimension breakdowns, anomaly alerts, and column details. These buttons pre-fill the chat with relevant context, so the AI already knows what you’re looking at. For example, clicking “Ask About This” on a low completeness score sends a query like:

“The completeness dimension scored 62%. What columns are driving this down and how can I improve it?”

The AI receives the dimension context, the file’s quality results, and your organisation’s profile — so the answer is specific to your situation, not generic advice.

Token costs

Chat queries consume tokens from your monthly bucket:

Plan	Monthly tokens	Simple queries	Complex queries	Action queries
Free	50	50	16	10
Pro	5,000	5,000	1,666	1,000
Enterprise	25,000	25,000	8,333	5,000

Token counts shown are maximum if you only used that query type. In practice, you’ll use a mix. If your balance is insufficient for a query, the chat will let you know before deducting tokens. If an AI call fails after deduction, tokens are automatically refunded.

Tips for better answers

Be specific about columns and files

Instead of “How’s my data quality?”, ask “What are the top 3 quality issues in customers_march.csv?” — the AI can give you a concrete, actionable answer.

Reference column names

“Why does the annual_revenue column have outliers?” gets a better response than “Why are there outliers?” because the AI can look up the specific column’s statistics and value distribution.

Ask follow-ups

The chat maintains conversation history (last 10 messages). After getting an answer, ask follow-up questions to drill deeper: “Which specific rows are affected?” or “What would happen if I applied winsorization?”

Use contextual buttons

The “Ask About This” buttons throughout the app provide the AI with structured context that produces more targeted answers than free-form questions.

Keep it about your data

The chat is scoped to data quality topics. Questions outside this scope (general knowledge, coding help, unrelated topics) are politely declined to keep token costs at zero for off-scope queries.

Chat via the API

You can also use the chat programmatically:

Create a conversation

POST /api/v1/chat/conversations
Content-Type: application/json
Authorization: Bearer <token>

{
  "title": "Q1 data review"
}

Send a message

POST /api/v1/chat/conversations/{conversation_id}/messages
Content-Type: application/json
Authorization: Bearer <token>

{
  "content": "What are the top quality issues across all my files?",
  "context": null
}

The response includes the AI’s answer, the query type classification, and the token cost:

{
  "data": {
    "id": "uuid",
    "role": "assistant",
    "content": "Based on your latest scans, the top 3 issues are...",
    "query_type": "complex",
    "token_cost": 3,
    "created_at": "2026-03-30T14:22:00Z"
  }
}

List conversations

GET /api/v1/chat/conversations
Authorization: Bearer <token>

Security

All chat inputs pass through a security gate that sanitizes queries before they reach the AI model. The system:

Scrubs any PII from user messages before logging
Validates AI responses for safety
Rejects prompt injection attempts
Rate-limits queries per user

Chat data is scoped to your organisation and never shared across tenants.

Next steps

Classification

Learn how ORCA classifies your columns

AI readiness

Understand the scoring methodology the chat references

Getting started

Features

Administration

Integrations

Security & compliance

Developer Tools

Methodology

Overview

What you can ask

Simple questions (1 token)

Complex questions (3 tokens)

Action requests (5 tokens)

Query routing

SQL-of-Thought reasoning

Contextual questions

Token costs

Tips for better answers

Chat via the API

Create a conversation

Send a message

List conversations

Security

Next steps

Classification

AI readiness

​Overview

​What you can ask

​Simple questions (1 token)

​Complex questions (3 tokens)

​Action requests (5 tokens)

​Query routing

​SQL-of-Thought reasoning

​Contextual questions

​Token costs

​Tips for better answers

​Chat via the API

​Create a conversation

​Send a message

​List conversations

​Security

​Next steps

Classification

AI readiness

Overview

What you can ask

Simple questions (1 token)

Complex questions (3 tokens)

Action requests (5 tokens)

Query routing

SQL-of-Thought reasoning

Contextual questions

Token costs

Tips for better answers

Chat via the API

Create a conversation

Send a message

List conversations

Security

Next steps