The Kirtonic Engine
A runtime governance plane for multi-provider AI, architecture, protocol, threat model.
Abstract
This paper describes the Kirtonic Engine, a runtime governance layer for AI systems used by regulated enterprises. The Engine sits in the request path between an application and one or more large language model (LLM) providers, classifies each input and each output against an enterprise-specific policy, routes high-risk turns to human review, and produces an immutable audit trail. It is provider-agnostic (OpenAI, Anthropic, Google Gemini, and any OpenAI-compatible endpoint including Ollama and vLLM) and supports both in-prompt and fine-tuned adaptation paths so that reviewer corrections become signal for the classifier without requiring a model training cycle to take effect.
We motivate the design with three observations: (i) policy is fundamentally organisational and cannot be expressed as a generic safety filter; (ii) confidence and severity must be treated as independent dimensions when deciding whether to require human review; (iii) every classification decision must be linkable to a tamper-evident record that names the actor, the policy applied, and the resulting action. The Engine's architecture follows from these.
Introduction
Generative AI in regulated industries faces a structural tension. Foundation model providers (OpenAI, Anthropic, Google) ship safety policies designed for the median consumer. Regulated firms, banks, insurers, healthcare operators, law firms, have policies designed for their specific regulatory perimeter (UK FCA, HIPAA, PCI DSS, MiFID II, NIS2). The two policies overlap but rarely match: a response that the foundation model considers acceptable may be a market-abuse violation in a wealth-management context, and a response the foundation model refuses may be entirely legitimate in an internal compliance copilot.
Three architectural patterns are commonly deployed to bridge this gap. Each exhibits limitations relevant to the design of the Engine:
- SDK wrappers. Application code wraps the OpenAI or Anthropic SDK with custom pre- and post-filters. The wrapper is per-application, lacks a central governance surface, and is subject to bypass when integrations are added under time pressure.
- TLS-terminating reverse proxies. A proxy is deployed in front of each provider domain. The pattern enables traffic interception but does not retain the link between message content and business context, the calling surface, the account, the policy version in effect at the time of the call. Governance decisions cannot be reconstructed without that context.
- Asynchronous review via log scraping. Tickets are generated from provider logs after the response has been delivered to the end user. The resulting audit trail is reactive; the system cannot prevent high-severity outputs reaching the caller.
The Engine operates as a synchronous control plane at the request boundary. It retains business context per call (surface, actor, policy version, classifier inputs) and is the single source of truth for the permitted action set. Applications integrate by routing requests through the Engine's endpoint.
System Overview
The Engine is a layered pipeline. Each layer has a narrow contract and can be understood, tested, and replaced in isolation.
┌─────────────────────────────────────────────────────────┐
│ Application (chat client, agent runtime, MCP server) │
└────────────────────────┬────────────────────────────────┘
│ POST /api/v1/signals
│ (or playground / proxy mode)
▼
┌─────────────────────────────────────────────────────────┐
│ Signal Ingestion │
│ · workspace + auth resolution │
│ · schema validation │
│ · governance_signals row INSERT │
└────────────────────────┬────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────┐
│ Classification Layer │
│ · Kirtonic hosted LLM classifier (calibrated) │
│ · workspace policy rules (in-prompt) │
│ · past reclassifications (in-prompt feedback) │
│ · ↳ {risk_score, confidence, reason, category} │
└────────────────────────┬────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────┐
│ Rules Engine (pure) │
│ · severity = classify(risk_score, thresholds) │
│ · routing = decide(severity, confidence, policy) │
└────────────────────────┬────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────┐
│ Decision Persistence + Audit │
│ · governance_decisions INSERT │
│ · governance_audit append (signal_received, │
│ decision_created, auto_approved, …) │
└────────────────────────┬────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────┐
│ Effect Layer │
│ · BLOCK → refusal returned to caller │
│ · QUEUE → decision visible at /engine/decisions │
│ · ALLOW → proceed; if proxy mode, forward to provider │
│ · webhook delivery (HMAC-signed, best-effort) │
└─────────────────────────────────────────────────────────┘Three properties are load-bearing:
- The rules engine is pure.
evaluateSignal(signal, rules) → RuleEvaluationtakes no external state and writes no state. It is trivially testable, deterministic, and the same function runs in the API route, the playground, and the seed endpoint. This makes "why was this decision made?" a closed question that the audit log can answer without re-running the model. - Classification and routing are separated. The classifier produces a continuous risk score; the rules engine maps that score onto discrete actions. Operators can re-tune the rules engine without re-evaluating the classifier, and vice-versa.
- Effects fire only after persistence. The decision row is written before any webhook is dispatched, before any block is enforced, before any forward to a provider. If the database write fails, the effect does not happen , and there is no orphan side-effect.
Signal Ingestion
All traffic enters through a single endpoint:
POST /api/v1/signals
Authorization: Bearer cw_live_… # or cookie session
{
"source": "fraud-model-v3", // required, ≤ 200 chars
"entity_id": "txn_abc123", // required, ≤ 200 chars, the subject of the decision
"risk_score": 0.84, // required, 0..1, caller's pre-computed score, or
// null when the engine should classify
"confidence": 0.91, // required, 0..1
"context": "credit", // optional, the surface the message is on
"metadata": { ... } // optional, arbitrary structured payload
}The handler resolves the workspace and the actor (API key or cookie session), validates the schema, and calls the shared ingestSignal() helper. This helper is the single mutating path: it is used by the external API route, by the playground, and by the sample-data seeder. There is no other code that may insert into governance_signals.
// src/lib/governance/ingest.ts
export async function ingestSignal(
supabase: SupabaseClient,
workspaceId: string,
input: SignalInput,
actor: ActorInfo,
): Promise<{ signal: Signal; decision: Decision; evaluation: RuleEvaluation }> {
const rules = await loadRules(supabase, workspaceId) // 1) load rules
const signal = await insertSignal(supabase, workspaceId, input) // 2) signal row
const eval = evaluateSignal(signal, rules) // 3) pure rules
const decision = await insertDecision(supabase, signal, eval)// 4) decision row
await appendAudit(supabase, signal, decision, actor) // 5) audit rows
void deliverWebhooks(workspaceId, eventFor(eval), { signal, decision, eval })
return { signal, decision, evaluation: eval }
}Authentication accepts two modes. Production traffic uses workspace API keys, SHA-256-hashed at rest, prefix-only displayed in the UI, scoped (governance:write, governance:read, etc.). Browser-driven flows (the playground, dashboard, debugging) use the Supabase cookie session and inherit the workspace context from the user's membership.
Classification Layer
For workflows where the caller already has a risk score (a fraud model, a sanctions screener), the classification layer is skipped, the score from the request body flows straight into the rules engine. For workflows where the caller is shipping raw chat content (the playground, agent runtimes, MCP servers), the Engine classifies the content itself using a calibrated general-purpose model.
5.1 The classifier
The classifier is implemented as a single call to Kirtonic's hosted LLM with a system prompt designed to produce a calibrated JSON response. The choice of underlying model is an internal implementation detail, the platform interface guarantees the same JSON envelope regardless. We selected a model with (a) strong empirical priors on its refusal calibration in safety-classification tasks; (b) reliable JSON output without function-calling overhead; (c) a long context window so the policy-as-context pattern (Section 5.3) works without truncation.
The system prompt has three sections:
- Risk dimensions, what counts as risk (regulated content, PII, prompt injection, safety, confidentiality, operational risk).
- Calibration anchors, explicit risk-score targets for representative cases. Without these, the model clusters scores around 0.5, 0.7 and the severity bands become noise.
- Workspace context, assembled per request from the workspace's policy playbook and recent human reclassifications (Sections 5.3 and 12).
The classifier returns:
type Classification = {
risk_score: number // 0..1
confidence: number // 0..1
reason: string // one sentence, surfaces in the UI + audit log
category: 'regulated_advice' | 'pii' | 'injection' |
'safety' | 'confidentiality' | 'operational' |
'clean' | 'other'
}5.2 Calibration
Risk scores are mapped to the rules engine's severity bands by simple thresholds. The defaults, severity_high = 0.80, severity_medium = 0.60, are not arbitrary. They correspond to the calibration anchors in the system prompt, which were tuned against a labelled evaluation set of regulated-domain chat turns. The anchors are:
Truly benign small talk on a generic surface 0.05, 0.15 Mildly off-topic but harmless 0.15, 0.35 On the edge: regulated topic discussed informally, 0.40, 0.65 mild PII Clearly risky: financial recommendation, share of 0.70, 0.90 PII, prompt injection Unambiguous policy violation, jailbreak, sensitive 0.90, 1.00 PII leak
Severity thresholds are configurable per workspace via governance_rules and take effect on the next classification call; no classifier retraining is required. When thresholds are adjusted, a shadow period (typically two weeks) is recommended in which all decisions are routed to human review without enforcement, the resulting queue is inspected, and the new thresholds are then promoted to enforcement state.
5.3 In-prompt policy
The workspace's natural-language playbook (Section 6.2) is loaded on every classification call and appended to the system prompt:
WORKSPACE POLICIES, score risk in light of these. Content
that violates a policy below should score AT LEAST the stated
severity, regardless of how harmless it might look in
isolation:
### Regulated advice
- Do not give specific investment, tax, or legal advice,
always defer to a qualified professional.
[severity: high · action: block]
### PII & sensitive data
- Block any message containing full credit card numbers,
CVVs, or full bank account numbers.
[severity: high · action: block]This is the "policy as prompt" pattern: the customer writes rules in plain English, an AI categorisation step labels each with (category, severity, action), and the rules are passed verbatim to the classifier as part of its system instructions. The classifier learns to recognise the policy violations because the policy text describes the violation in the customer's own words. No fine-tuning required; changes take effect on the next request.
Rules Engine
The rules engine is a pure function over the classification output. It does not call the model, the database, or any external service.
// src/lib/governance/rules.ts
export function evaluateSignal(
signal: Pick<Signal, 'risk_score' | 'confidence' | 'context'>,
rules: GovernanceRules,
): RuleEvaluation {
const severity = classifySeverity(signal.risk_score, rules)
const lowConf = signal.confidence < rules.require_review_below_confidence
const highRev = severity === 'high' && rules.require_review_for_high
const lowAuto = severity === 'low' && rules.auto_approve_low && !lowConf
if (highRev) return { severity, status: 'awaiting_approval', reason: '…' }
if (lowConf) return { severity, status: 'awaiting_approval', reason: '…' }
if (lowAuto) return { severity, status: 'auto_approved', reason: '…' }
return { severity, status: 'awaiting_approval', reason: '…' }
}6.1 Dual-factor routing
A signal can land in human review for two independent reasons: severity (the classifier thinks the content is risky) or confidence (the classifier is unsure). This decoupling matters because the two failure modes have very different remediations: a high-severity-but- confident decision is correctly routed; a low-severity-but-uncertain decision is a signal that the classifier needs more in-prompt context, more training data, or a stricter policy.
The truth table:
confident not confident high → review (severity) → review (severity OR confidence) medium → review (default) → review low → auto_approve → review (confidence)
6.2 The natural-language playbook
The second layer of rules is the workspace policy playbook, governance_policy_rules. Each row is one human-written policy statement plus structured metadata:
type PolicyRule = {
workspace_id: string
text: string // "Block messages containing full card numbers."
category: 'pii' | 'regulated_advice' | 'safety' | 'confidentiality' |
'operational' | 'brand' | 'injection' | 'other'
severity: 'high' | 'medium' | 'low'
action: 'block' | 'route_to_review' | 'redact' | 'log_only'
surfaces: string[] // empty = all surfaces
enabled: boolean
}When the customer adds a rule in plain English the categorisePolicyRule() function calls the Kirtonic engine with a strict classification prompt to suggest category, severity and action. The customer can override every field; the categorisation is an ergonomic shortcut, not a contract.
Decision Lifecycle
Every signal produces exactly one decision. The decision has a state machine with six terminal and non-terminal states:
┌─────────────────────┐
│ awaiting_approval │ ── (auto-route on creation if low severity + confident)
└──────────┬──────────┘ │
│ POST /v1/decisions/[id]/approve │
▼ ▼
┌────────┐ ┌────────────────┐
│approved│ │ auto_approved │
└───┬────┘ └────────┬───────┘
│ POST /v1/decisions/[id]/execute │
└────────────┬─────────────────────────────┘
▼
┌──────────┐
│ executed │ ── terminal
└──────────┘
┌─────────────────────┐
│ awaiting_approval │ ── POST /v1/decisions/[id]/reject ──→ ┌─────────┐
└─────────────────────┘ │rejected │ terminal
└─────────┘
failed ── terminal, set by downstream effects (webhook receiver rejection)Transitions are enforced at the API layer. A call to execute on an awaiting_approval decision returns 409 Conflict; a call to approve on an executed decision returns the same. The transition graph is intentionally restricted to a single linear path per decision so the audit log holds exactly one terminal state per decision_id, with one recorded actor and timestamp at each transition.
7.1 Severity reclassification
A reviewer can correct the classifier's severity at any point in the lifecycle. The reclassify endpoint preserves the classifier's original call:
POST /api/v1/decisions/[id]/reclassify
{ "severity": "medium", "reason": "..." }
→ governance_decisions
severity ← new value
original_severity ← (snapshot of previous severity, only set on first override)
→ governance_audit (append)
event_type: 'severity_overridden'
actor_type: 'user' | 'api_key'
detail: { from, to, reason }original_severityis set only on the first override, so the audit log preserves "what did the classifier originally say" independently of how many times a human has subsequently corrected it.
The Audit Log
The audit log is the regulatory artefact. It is append-only at the application layer (no DELETE or UPDATE endpoints exist for governance_audit rows) and RLS-scoped per workspace.
type AuditEvent = {
id: uuid // monotonic, derived from default uuid_generate_v4
workspace_id: uuid // RLS-isolated
decision_id: uuid | null
signal_id: uuid | null
event_type: 'signal_received'
| 'decision_created'
| 'auto_approved'
| 'approved'
| 'rejected'
| 'executed'
| 'severity_overridden'
| 'webhook_delivered'
| 'webhook_failed'
| 'rules_updated'
actor_type: 'user' | 'system' | 'api_key'
actor_id: uuid | null // user_id when actor_type='user', null otherwise
detail: jsonb // event-specific payload
created_at: timestamptz // append-only ordering
}8.1 Tamper-evident properties
We do not claim cryptographic immutability, there is no Merkle chain in the current implementation. The properties we do guarantee are:
- No application path mutates audit rows. The only write is INSERT. Service-role writes are scoped to webhook delivery logging and signal ingestion. No UPDATE or DELETE statement against
governance_auditexists in the codebase. - Workspace isolation by RLS. The RLS policy
“Audit: members read”uses theis_workspace_memberSECURITY DEFINER function, which prevents the join-explosion attack where one workspace's row references another workspace's decision. - Actor attribution is required. Every audit row has either an
actor_id(user override of a decision) or an explicitactor_type = 'system'with a typed detail field.
For customers requiring cryptographic tamper-evidence, the Engine emits the audit row as a webhook event and the customer can chain the hashes externally, this is on the v2 roadmap (Section 14).
Webhook Delivery
Webhooks are the integration point for downstream systems, alerting, ticketing, account locking, secondary review queues. Each webhook subscribes to a subset of decision events and receives signed POST requests.
9.1 Signature
POST https://your-system.example.com/kirtonic-hook
Content-Type: application/json
X-Kirtonic-Event: decision.created
X-Kirtonic-Signature: sha256=8f3c...e9
X-Kirtonic-Test: true # only on test fires from /webhooks/[id]/test
{
"event": "decision.created",
"delivered_at": "2026-03-12T14:08:21.443Z",
"data": {
"signal": { ... full signal row ... },
"decision": { ... full decision row ... },
"evaluation":{ "severity": "high", "reason": "..." }
}
}The signature is HMAC-SHA256 of the raw request body using the webhook's signing_secret. The secret is generated server-side at webhook creation, returned once to the operator (and only once), and never exposed again via the API. Receiver verification:
import { createHmac, timingSafeEqual } from "crypto"
function verify(rawBody: Buffer, headerSig: string): boolean {
const expected = createHmac("sha256", SECRET).update(rawBody).digest("hex")
const a = Buffer.from(headerSig.replace("sha256=", ""), "hex")
const b = Buffer.from(expected, "hex")
return a.length === b.length && timingSafeEqual(a, b)
}9.2 Delivery semantics
Delivery is fire-and-forget with a 10-second timeout. The v1 webhook does not retry on transient failure. This is by design: (a) decision events have latency-sensitive consumers and a queued retry window of hours is not useful to the downstream system; (b) at-least-once semantics would shift idempotency-handling cost to every receiver implementation. Failed deliveries write a webhook_failed audit event and update the webhook row's last_delivery_status. Operators can re-fire any decision's webhook from the UI.
We are evaluating exponential-backoff retry behind a per-webhook opt-in for v2, see Section 14.
Multi-provider Proxy
For workspaces using proxy mode (the chat playground; agent runtimes in production), the Engine fronts three foundation-model providers and any OpenAI-compatible endpoint. The proxy translates the request shape native to each provider.
Provider Endpoint Auth
───────── ────────────────────────────────────────────── ─────────────────
OpenAI https://api.openai.com/v1/chat/completions Bearer (sk-…)
Anthropic https://api.anthropic.com/v1/messages x-api-key (sk-ant-…)
Gemini …generativelanguage.googleapis.com/v1beta/… ?key=AIza…
Custom <user-supplied>/v1/chat/completions Bearer (optional)
e.g. Ollama http://localhost:11434/v1/chat/completions (none, local)
vLLM http://gpu-host:8000/v1/chat/completions Bearer (optional)10.1 Native API translation
Each provider expects a slightly different request shape. The proxy normalises: messages with role: system become Anthropic's top-level system field; assistant messages become role: model in Gemini's contents array. The normalisation is implemented in callProviderChat() (src/lib/governance/provider-chat.ts) so each provider can be tested in isolation.
10.2 Key resolution order
For any given chat call the proxy resolves the API key in this strict order:
- Active custom model's encrypted key (
workspace_models.encrypted_api_key), for imported endpoints. - Inline key from the request body, useful for one-off testing without touching stored keys.
- Workspace integration key for the selected provider (
workspace_integrations.encrypted_keywhereprovider = <chosen>). - Reject with HTTP 400 and
{ missing_integration: true }so the UI can prompt for setup.
For OpenAI-compatible local endpoints with no auth (the Ollama default) an empty key is accepted and no Authorization header is sent.
10.3 Host trace
Every proxy response includes a host_traceobject describing the data's journey:
host_trace: {
kirtonic: { hostname: 'kirtonic.io', region: 'eu-west-2', country: 'United Kingdom' },
classifier: { hostname: 'api.anthropic.com', region: 'US (global edge)', country: 'United States' },
provider: { hostname: 'api.openai.com', region: 'US East', country: 'United States' },
}The trace is used by the playground's data-flow diagram to visualise where the message went on this specific turn. Region labels are derived from each provider's public infrastructure documentation and are not authoritative, individual requests may route to other edge locations within the same legal jurisdiction.
Custom Classifier Training
The Engine's default classifier is claude-sonnet-4-6. Customers with enough decision history can train a domain-specific replacement that inherits the policy without re-reading the entire system prompt on every call. Trained models live in the customer's own provider account.
11.1 Dataset construction
buildDataset(supabase, workspaceId) reads every settled decision in the workspace, joins the originating signal, and emits OpenAI chat fine-tuning JSONL:
{"messages":[
{"role":"system","content":"You are a safety classifier tuned for one customer..."},
{"role":"user","content":"{\"source\":\"playground/user\",\"context\":\"customer-support\",\"risk_score\":0.18,\"confidence\":0.82,\"metadata\":{...}}"},
{"role":"assistant","content":"{\"label\":\"allow\",\"severity\":\"low\"}"}
]}Labelling:
approved/auto_approved/executed→ label"allow"rejected→ label"block"awaiting_approval→ excluded (no ground truth)severityuses the current value, so reclassifications are honoured.
11.2 Training flow
POST /api/governance/models
{ name, base_model: "gpt-4o-mini-2024-07-18" }
→ build dataset JSONL # workspace_models.dataset_size
→ POST https://api.openai.com/v1/files # external_file_id
→ POST .../v1/fine_tuning/jobs # external_job_id, status: 'queued'
→ poll job status via /[id]/status # → 'training' → 'succeeded'
→ activate the trained model id # is_active: true (partial unique index
# enforces one-active-per-workspace)11.3 Imported and OpenAI-compatible models
Workspaces can also import a model trained elsewhere, an existing OpenAI fine-tune (ft:gpt-4o-mini:org:tag:abc) or any OpenAI-compatible endpoint (Ollama, LM Studio, vLLM). Imported models go straight to status "succeeded" and can be activated immediately. The endpoint URL and optional per-model encrypted API key live in the workspace_models row.
Reclassification Feedback Loop
Fine-tuning a custom model is a hours-to-days operation. In the meantime, reviewer corrections need to take effect immediately. The Engine implements a second adaptation path: in-prompt feedback.
On every classification call, loadRecentReclassifications()pulls the workspace's most recent N (default 20) decisions where original_severity is not null, i.e. ones a reviewer corrected. The list is formatted into the classifier's system prompt as:
PAST HUMAN CORRECTIONS in this workspace, these are real
messages where a reviewer explicitly disagreed with the
classifier and changed the severity. Apply the SAME reasoning
to similar new messages: if you would have scored a message
like one of these at the classifier's original severity,
score it at the corrected severity instead. Look for semantic
similarity, not exact wording.
- On the "playground/user" source [customer-support]:
"I'm going to import company accounts" → reviewer
reclassified from low to medium.
- ...This is the in-prompt version of few-shot learning. The classifier reads the corrections, recognises the semantic pattern, and re-scores similar incoming messages without any training cycle. The two adaptation paths combine: corrections are visible to the classifier on the next request, AND they become labelled training examples that get baked into a fine-tuned model on the next training run.
Bounded retrieval (N=20) keeps the prompt tractable. For workspaces with hundreds of reclassifications, the next iteration uses embedding-based retrieval to fetch only the corrections most semantically similar to the incoming message (Section 14).
Security Model
13.1 Tenant isolation
Every governance table, governance_signals, governance_decisions, governance_audit, governance_rules, governance_policy_rules, governance_webhooks, governance_playground_sessions, governance_playground_messages, workspace_integrations, and workspace_models, has Row-Level Security enabled with a policy of the form:
create policy "<...>: members read" on public.<table>
for select using (public.is_workspace_member(workspace_id));
create policy "<...>: admins write" on public.<table>
for all using (public.workspace_role(workspace_id) in ('owner','admin'));is_workspace_member() and workspace_role() are SECURITY DEFINER functions that read workspace_members with elevated privilege, bypassing the RLS that would otherwise cause infinite recursion. This is the canonical Supabase pattern for multi-tenant RLS.
13.2 Credential encryption
Customer-supplied provider keys (OpenAI, Anthropic, Gemini) and per-model API keys are encrypted at rest using AES-256-GCM. The encryption key is read from process.env.INTEGRATION_ENCRYPTION_KEY, 32 bytes of hex, generated once at platform setup and stored only in the hosting provider's secret manager.
// src/lib/crypto/encrypt.ts (excerpted)
const ALGO = 'aes-256-gcm'
export function encryptToken(plaintext: string): string {
const iv = randomBytes(12)
const cipher = createCipheriv(ALGO, key(), iv)
const ct = Buffer.concat([cipher.update(plaintext, 'utf8'), cipher.final()])
const tag = cipher.getAuthTag()
return `${iv.toString('hex')}:${tag.toString('hex')}:${ct.toString('hex')}`
}The threat model for credential storage:
Threat Mitigation
──────────────────────────────────── ──────────────────────────────────────
Database dump (read-only attacker) Ciphertext only; needs ENCRYPTION_KEY
ENCRYPTION_KEY leak via logs Helper imports never log the key;
error messages reference the env var
name only
Encrypted blob mismatch (corruption) GCM auth tag rejects on decrypt
Replay across workspaces Workspace ID is part of the row's PK
chain; RLS enforces ownership
Plaintext exposure via API response POST returns only last_four; GET never
returns ciphertext or plaintext13.3 Workspace API keys
Workspace API keys are themselves credentials. They are issued as random 32-byte tokens prefixed with cw_live_, displayed plaintext exactly once at creation, and stored as a SHA-256 hash plus a 12-character prefix. The hash is used for verification; the prefix is used for display. Revocation is a soft delete (revoked_at timestamp) so the audit log retains attribution.
Threat Model
We enumerate the realistic adversaries and how the Engine responds. This is not exhaustive, it is the set of threats we have explicitly considered.
Adversary Capability Engine response
────────────────────────────── ────────────────────────────────── ─────────────────────────────────
End user of customer app Crafts adversarial prompts Classifier scores; rules engine
blocks or queues
Compromised application key Submits arbitrary signals Workspace-scoped API key; RLS;
audit trail captures everything
Insider with workspace access Approves a high-risk decision Audit row names them with
timestamp; reclassification
also audited
Insider with DB read-only Reads workspace tables Sees ciphertext for keys;
plaintext content is in
governance_signals.metadata
(this IS sensitive, see below)
Network attacker MITM between Kirtonic and provider TLS everywhere; signed webhooks
for downstream
Provider compromise OpenAI/Anthropic key abuse Keys are encrypted per workspace;
revocable from Integrations
Rogue webhook receiver Forges decision events HMAC-SHA256 signature on every
event; receiver MUST verify
Audit log tampering Direct SQL UPDATE on audit table RLS limits attack surface;
application has no UPDATE path;
cryptographic chain in v2The insider-DB-read row is the most material residual risk: signal metadata may contain plaintext PII or message previews. Customers requiring this to be mitigated should: (a) configure the Trust Centre to redact PII at ingestion, (b) use a self-hosted deployment so the database lives in their own VPC, or (c) limit the signal payload they POST to non-sensitive identifiers and keep the message content in their own systems.
Performance, Scaling, Compliance
15.1 Latency
Typical end-to-end latency for a classifier-only turn:
signal validation + workspace resolve 5, 15 ms load policy rules + reclassifications 20, 40 ms (single Postgres round-trip) classifier call (Kirtonic hosted LLM) 300, 700 ms (network-bound) rules engine + decision INSERT 10, 30 ms audit row INSERT (batched 2 rows) 5, 15 ms ───────────────────────────────────────── ────────────── Total ≈ 350, 800 ms
Proxy mode adds a second classifier call on the response (similar 350, 800 ms) plus the foundation-model latency (provider-dependent, 500, 3000 ms typical for short responses). Webhook delivery is fired async with void deliverWebhooks(...) and does not block the response.
15.2 Throughput
The hot path is dominated by the classifier call. Anthropic's default tier allows ~50 requests/sec/account. Workspaces approaching that limit should either (a) train and activate a custom classifier so the call goes to a dedicated fine-tune, (b) use classifier-only mode for low-risk surfaces and reserve full governance for regulated paths, or (c) request a dedicated rate-limit pool.
15.3 Compliance posture
Framework Engine support Customer responsibility
───────────── ─────────────────────────────────────────────────── ───────────────────────────
UK GDPR UK-region data residency option · audit log · DPIA · data-controller
configurable retention obligations
HIPAA BAA available · PII / PHI redaction in pipeline · PHI classification on the
encrypted at rest customer's own data
PCI DSS Card-number detection rule shipped in starter pack · Cardholder-data environment
classifier scores card patterns 0.9+ scope decisions
SOC 2 Tamper-evident audit · RLS · API key rotation Vendor management ·
employee access reviews
ISO 27001 Aligns with A.5 (policies), A.8 (asset mgmt), ISMS · risk register
A.12 (operations), A.16 (incident mgmt)API Surface (v1)
The full public API. All endpoints accept either a workspace API key (Authorization: Bearer cw_live_…) with the appropriate scope, or a Supabase cookie session.
Endpoint Method Scope Notes ──────────────────────────────────────────────────── ────── ───────────────────── ──────────────── /api/v1/signals POST governance:write Single entry point /api/v1/decisions GET governance:read Paginated list /api/v1/decisions/[id]/approve POST governance:write /api/v1/decisions/[id]/reject POST governance:write Reason required /api/v1/decisions/[id]/execute POST governance:write Fires webhook /api/v1/decisions/[id]/reclassify POST governance:write Snapshots original /api/v1/audit GET governance:read Filterable /api/governance/rules GET,PUT cookie (admin) /api/governance/policy-rules GET,POST cookie (admin) AI auto-categorise /api/governance/policy-rules/[id] PATCH,DELETE cookie (admin) /api/governance/webhooks GET,POST cookie (admin) /api/governance/webhooks/[id] PATCH,DELETE cookie (admin) /api/governance/webhooks/[id]/test POST cookie (admin) /api/governance/stats GET cookie /api/governance/models GET,POST cookie (admin) Train /api/governance/models/import POST cookie (admin) Import existing /api/governance/models/import/test POST cookie (admin) Probe endpoint /api/governance/models/dataset GET cookie (admin) Download JSONL /api/governance/models/[id]/status POST cookie (admin) Poll OpenAI job /api/governance/models/[id]/activate POST cookie (admin) One per workspace /api/governance/models/[id]/bundle GET cookie (admin) Portable JSON /api/integrations GET,POST,DELETE cookie (admin) 3 providers /api/integrations/test POST cookie (admin) /api/governance/playground/chat POST cookie In-platform proxy /api/governance/playground/sessions GET,POST cookie /api/governance/playground/sessions/[id] GET,DELETE cookie
Future Work
- Embedding-based reclassification retrieval.Move from "most recent N corrections" to "top-k semantically similar corrections." Requires per- workspace embedding index (pgvector candidate).
- Cryptographic audit chain.Each audit row signs the previous row's hash; signed root published periodically. Customer can detect tampering externally.
- Webhook retry with exponential backoff.Opt-in per webhook to avoid breaking customers who've built around at-most-once delivery.
- Self-hosted deployment. Engine + Postgres + Anthropic key inside the customer VPC. Removes the Anthropic-as-classifier residual risk for customers who need it.
- Streaming proxy. Forward provider responses to the client as a stream while running output classification on the assembled response, current implementation buffers the full response before classifying.
- Multi-region active-active. Currently UK-primary. EU and US regions are next, with workspace-pinned region selection.