whitepaper · v1 · 2026

The Kirtonic Engine

A runtime governance plane for multi-provider AI, architecture, protocol, threat model.

17 sections~6,000 wordspayload schemastype definitionsthreat model

section 01

Abstract

This paper describes the Kirtonic Engine, a runtime governance layer for AI systems used by regulated enterprises. The Engine sits in the request path between an application and one or more large language model (LLM) providers, classifies each input and each output against an enterprise-specific policy, routes high-risk turns to human review, and produces an immutable audit trail. It is provider-agnostic (OpenAI, Anthropic, Google Gemini, and any OpenAI-compatible endpoint including Ollama and vLLM) and supports both in-prompt and fine-tuned adaptation paths so that reviewer corrections become signal for the classifier without requiring a model training cycle to take effect.

We motivate the design with three observations: (i) policy is fundamentally organisational and cannot be expressed as a generic safety filter; (ii) confidence and severity must be treated as independent dimensions when deciding whether to require human review; (iii) every classification decision must be linkable to a tamper-evident record that names the actor, the policy applied, and the resulting action. The Engine's architecture follows from these.

section 02

Introduction

Generative AI in regulated industries faces a structural tension. Foundation model providers (OpenAI, Anthropic, Google) ship safety policies designed for the median consumer. Regulated firms, banks, insurers, healthcare operators, law firms, have policies designed for their specific regulatory perimeter (UK FCA, HIPAA, PCI DSS, MiFID II, NIS2). The two policies overlap but rarely match: a response that the foundation model considers acceptable may be a market-abuse violation in a wealth-management context, and a response the foundation model refuses may be entirely legitimate in an internal compliance copilot.

Three architectural patterns are commonly deployed to bridge this gap. Each exhibits limitations relevant to the design of the Engine:

SDK wrappers. Application code wraps the OpenAI or Anthropic SDK with custom pre- and post-filters. The wrapper is per-application, lacks a central governance surface, and is subject to bypass when integrations are added under time pressure.
TLS-terminating reverse proxies. A proxy is deployed in front of each provider domain. The pattern enables traffic interception but does not retain the link between message content and business context, the calling surface, the account, the policy version in effect at the time of the call. Governance decisions cannot be reconstructed without that context.
Asynchronous review via log scraping. Tickets are generated from provider logs after the response has been delivered to the end user. The resulting audit trail is reactive; the system cannot prevent high-severity outputs reaching the caller.

The Engine operates as a synchronous control plane at the request boundary. It retains business context per call (surface, actor, policy version, classifier inputs) and is the single source of truth for the permitted action set. Applications integrate by routing requests through the Engine's endpoint.

section 03

System Overview

The Engine is a layered pipeline. Each layer has a narrow contract and can be understood, tested, and replaced in isolation.

┌─────────────────────────────────────────────────────────┐
│  Application (chat client, agent runtime, MCP server)   │
└────────────────────────┬────────────────────────────────┘
                         │ POST /api/v1/signals
                         │   (or playground / proxy mode)
                         ▼
┌─────────────────────────────────────────────────────────┐
│  Signal Ingestion                                        │
│  · workspace + auth resolution                           │
│  · schema validation                                     │
│  · governance_signals row INSERT                         │
└────────────────────────┬────────────────────────────────┘
                         ▼
┌─────────────────────────────────────────────────────────┐
│  Classification Layer                                    │
│  · Kirtonic hosted LLM classifier (calibrated)           │
│  · workspace policy rules (in-prompt)                    │
│  · past reclassifications (in-prompt feedback)           │
│  · ↳ {risk_score, confidence, reason, category}          │
└────────────────────────┬────────────────────────────────┘
                         ▼
┌─────────────────────────────────────────────────────────┐
│  Rules Engine (pure)                                     │
│  · severity = classify(risk_score, thresholds)           │
│  · routing  = decide(severity, confidence, policy)       │
└────────────────────────┬────────────────────────────────┘
                         ▼
┌─────────────────────────────────────────────────────────┐
│  Decision Persistence + Audit                            │
│  · governance_decisions INSERT                           │
│  · governance_audit append (signal_received,             │
│    decision_created, auto_approved, …)                   │
└────────────────────────┬────────────────────────────────┘
                         ▼
┌─────────────────────────────────────────────────────────┐
│  Effect Layer                                            │
│  · BLOCK  → refusal returned to caller                   │
│  · QUEUE  → decision visible at /engine/decisions        │
│  · ALLOW  → proceed; if proxy mode, forward to provider  │
│  · webhook delivery (HMAC-signed, best-effort)           │
└─────────────────────────────────────────────────────────┘

Three properties are load-bearing:

The rules engine is pure. evaluateSignal(signal, rules) → RuleEvaluationtakes no external state and writes no state. It is trivially testable, deterministic, and the same function runs in the API route, the playground, and the seed endpoint. This makes "why was this decision made?" a closed question that the audit log can answer without re-running the model.
Classification and routing are separated. The classifier produces a continuous risk score; the rules engine maps that score onto discrete actions. Operators can re-tune the rules engine without re-evaluating the classifier, and vice-versa.
Effects fire only after persistence. The decision row is written before any webhook is dispatched, before any block is enforced, before any forward to a provider. If the database write fails, the effect does not happen , and there is no orphan side-effect.

section 04

Signal Ingestion

All traffic enters through a single endpoint:

POST /api/v1/signals
Authorization: Bearer cw_live_…    # or cookie session

{
  "source":     "fraud-model-v3",  // required, ≤ 200 chars
  "entity_id":  "txn_abc123",      // required, ≤ 200 chars, the subject of the decision
  "risk_score": 0.84,              // required, 0..1, caller's pre-computed score, or
                                   //                  null when the engine should classify
  "confidence": 0.91,              // required, 0..1
  "context":    "credit",          // optional, the surface the message is on
  "metadata":   { ... }            // optional, arbitrary structured payload
}

The handler resolves the workspace and the actor (API key or cookie session), validates the schema, and calls the shared ingestSignal() helper. This helper is the single mutating path: it is used by the external API route, by the playground, and by the sample-data seeder. There is no other code that may insert into governance_signals.

// src/lib/governance/ingest.ts
export async function ingestSignal(
  supabase: SupabaseClient,
  workspaceId: string,
  input: SignalInput,
  actor: ActorInfo,
): Promise<{ signal: Signal; decision: Decision; evaluation: RuleEvaluation }> {
  const rules     = await loadRules(supabase, workspaceId)      // 1) load rules
  const signal    = await insertSignal(supabase, workspaceId, input)  // 2) signal row
  const eval      = evaluateSignal(signal, rules)               // 3) pure rules
  const decision  = await insertDecision(supabase, signal, eval)// 4) decision row
  await appendAudit(supabase, signal, decision, actor)          // 5) audit rows
  void deliverWebhooks(workspaceId, eventFor(eval), { signal, decision, eval })
  return { signal, decision, evaluation: eval }
}

Authentication accepts two modes. Production traffic uses workspace API keys, SHA-256-hashed at rest, prefix-only displayed in the UI, scoped (governance:write, governance:read, etc.). Browser-driven flows (the playground, dashboard, debugging) use the Supabase cookie session and inherit the workspace context from the user's membership.

section 05

Classification Layer

For workflows where the caller already has a risk score (a fraud model, a sanctions screener), the classification layer is skipped, the score from the request body flows straight into the rules engine. For workflows where the caller is shipping raw chat content (the playground, agent runtimes, MCP servers), the Engine classifies the content itself using a calibrated general-purpose model.

5.1 The classifier

The classifier is implemented as a single call to Kirtonic's hosted LLM with a system prompt designed to produce a calibrated JSON response. The choice of underlying model is an internal implementation detail, the platform interface guarantees the same JSON envelope regardless. We selected a model with (a) strong empirical priors on its refusal calibration in safety-classification tasks; (b) reliable JSON output without function-calling overhead; (c) a long context window so the policy-as-context pattern (Section 5.3) works without truncation.

The system prompt has three sections:

Risk dimensions, what counts as risk (regulated content, PII, prompt injection, safety, confidentiality, operational risk).
Calibration anchors, explicit risk-score targets for representative cases. Without these, the model clusters scores around 0.5, 0.7 and the severity bands become noise.
Workspace context, assembled per request from the workspace's policy playbook and recent human reclassifications (Sections 5.3 and 12).

The classifier returns:

type Classification = {
  risk_score: number   // 0..1
  confidence: number   // 0..1
  reason:     string   // one sentence, surfaces in the UI + audit log
  category:   'regulated_advice' | 'pii' | 'injection' |
              'safety' | 'confidentiality' | 'operational' |
              'clean' | 'other'
}

5.2 Calibration

Risk scores are mapped to the rules engine's severity bands by simple thresholds. The defaults, severity_high = 0.80, severity_medium = 0.60, are not arbitrary. They correspond to the calibration anchors in the system prompt, which were tuned against a labelled evaluation set of regulated-domain chat turns. The anchors are:

Truly benign small talk on a generic surface          0.05, 0.15
Mildly off-topic but harmless                         0.15, 0.35
On the edge: regulated topic discussed informally,    0.40, 0.65
  mild PII
Clearly risky: financial recommendation, share of     0.70, 0.90
  PII, prompt injection
Unambiguous policy violation, jailbreak, sensitive    0.90, 1.00
  PII leak

Severity thresholds are configurable per workspace via governance_rules and take effect on the next classification call; no classifier retraining is required. When thresholds are adjusted, a shadow period (typically two weeks) is recommended in which all decisions are routed to human review without enforcement, the resulting queue is inspected, and the new thresholds are then promoted to enforcement state.

5.3 In-prompt policy

The workspace's natural-language playbook (Section 6.2) is loaded on every classification call and appended to the system prompt:

WORKSPACE POLICIES, score risk in light of these. Content
that violates a policy below should score AT LEAST the stated
severity, regardless of how harmless it might look in
isolation:

### Regulated advice
  - Do not give specific investment, tax, or legal advice,
    always defer to a qualified professional.
    [severity: high · action: block]

### PII & sensitive data
  - Block any message containing full credit card numbers,
    CVVs, or full bank account numbers.
    [severity: high · action: block]

This is the "policy as prompt" pattern: the customer writes rules in plain English, an AI categorisation step labels each with (category, severity, action), and the rules are passed verbatim to the classifier as part of its system instructions. The classifier learns to recognise the policy violations because the policy text describes the violation in the customer's own words. No fine-tuning required; changes take effect on the next request.

section 06

Rules Engine

The rules engine is a pure function over the classification output. It does not call the model, the database, or any external service.

// src/lib/governance/rules.ts
export function evaluateSignal(
  signal: Pick<Signal, 'risk_score' | 'confidence' | 'context'>,
  rules: GovernanceRules,
): RuleEvaluation {
  const severity = classifySeverity(signal.risk_score, rules)
  const lowConf  = signal.confidence < rules.require_review_below_confidence
  const highRev  = severity === 'high' && rules.require_review_for_high
  const lowAuto  = severity === 'low'  && rules.auto_approve_low && !lowConf

  if (highRev) return { severity, status: 'awaiting_approval', reason: '…' }
  if (lowConf) return { severity, status: 'awaiting_approval', reason: '…' }
  if (lowAuto) return { severity, status: 'auto_approved',    reason: '…' }
  return       { severity, status: 'awaiting_approval', reason: '…' }
}

6.1 Dual-factor routing

A signal can land in human review for two independent reasons: severity (the classifier thinks the content is risky) or confidence (the classifier is unsure). This decoupling matters because the two failure modes have very different remediations: a high-severity-but- confident decision is correctly routed; a low-severity-but-uncertain decision is a signal that the classifier needs more in-prompt context, more training data, or a stricter policy.

The truth table:

            confident                 not confident
  high      → review (severity)        → review (severity OR confidence)
  medium    → review (default)         → review
  low       → auto_approve             → review (confidence)

6.2 The natural-language playbook

The second layer of rules is the workspace policy playbook, governance_policy_rules. Each row is one human-written policy statement plus structured metadata:

type PolicyRule = {
  workspace_id: string
  text:         string             // "Block messages containing full card numbers."
  category:     'pii' | 'regulated_advice' | 'safety' | 'confidentiality' |
                'operational' | 'brand' | 'injection' | 'other'
  severity:     'high' | 'medium' | 'low'
  action:       'block' | 'route_to_review' | 'redact' | 'log_only'
  surfaces:     string[]           // empty = all surfaces
  enabled:      boolean
}

When the customer adds a rule in plain English the categorisePolicyRule() function calls the Kirtonic engine with a strict classification prompt to suggest category, severity and action. The customer can override every field; the categorisation is an ergonomic shortcut, not a contract.

section 07

Decision Lifecycle

Every signal produces exactly one decision. The decision has a state machine with six terminal and non-terminal states:

     ┌─────────────────────┐
     │  awaiting_approval  │ ── (auto-route on creation if low severity + confident)
     └──────────┬──────────┘                              │
                │ POST /v1/decisions/[id]/approve         │
                ▼                                          ▼
            ┌────────┐                          ┌────────────────┐
            │approved│                          │ auto_approved  │
            └───┬────┘                          └────────┬───────┘
                │ POST /v1/decisions/[id]/execute        │
                └────────────┬─────────────────────────────┘
                             ▼
                       ┌──────────┐
                       │ executed │ ── terminal
                       └──────────┘

     ┌─────────────────────┐
     │  awaiting_approval  │ ── POST /v1/decisions/[id]/reject ──→ ┌─────────┐
     └─────────────────────┘                                       │rejected │ terminal
                                                                   └─────────┘

       failed ── terminal, set by downstream effects (webhook receiver rejection)

Transitions are enforced at the API layer. A call to execute on an awaiting_approval decision returns 409 Conflict; a call to approve on an executed decision returns the same. The transition graph is intentionally restricted to a single linear path per decision so the audit log holds exactly one terminal state per decision_id, with one recorded actor and timestamp at each transition.

7.1 Severity reclassification

A reviewer can correct the classifier's severity at any point in the lifecycle. The reclassify endpoint preserves the classifier's original call:

POST /api/v1/decisions/[id]/reclassify
{ "severity": "medium", "reason": "..." }

→ governance_decisions
    severity          ← new value
    original_severity ← (snapshot of previous severity, only set on first override)

→ governance_audit (append)
    event_type:  'severity_overridden'
    actor_type:  'user' | 'api_key'
    detail:      { from, to, reason }

original_severityis set only on the first override, so the audit log preserves "what did the classifier originally say" independently of how many times a human has subsequently corrected it.

section 08

The Audit Log

The audit log is the regulatory artefact. It is append-only at the application layer (no DELETE or UPDATE endpoints exist for governance_audit rows) and RLS-scoped per workspace.

type AuditEvent = {
  id:            uuid                    // monotonic, derived from default uuid_generate_v4
  workspace_id:  uuid                    // RLS-isolated
  decision_id:   uuid | null
  signal_id:     uuid | null
  event_type:    'signal_received'
               | 'decision_created'
               | 'auto_approved'
               | 'approved'
               | 'rejected'
               | 'executed'
               | 'severity_overridden'
               | 'webhook_delivered'
               | 'webhook_failed'
               | 'rules_updated'
  actor_type:    'user' | 'system' | 'api_key'
  actor_id:      uuid | null             // user_id when actor_type='user', null otherwise
  detail:        jsonb                   // event-specific payload
  created_at:    timestamptz             // append-only ordering
}

8.1 Tamper-evident properties

We do not claim cryptographic immutability, there is no Merkle chain in the current implementation. The properties we do guarantee are:

No application path mutates audit rows. The only write is INSERT. Service-role writes are scoped to webhook delivery logging and signal ingestion. No UPDATE or DELETE statement against governance_audit exists in the codebase.
Workspace isolation by RLS. The RLS policy “Audit: members read” uses the is_workspace_memberSECURITY DEFINER function, which prevents the join-explosion attack where one workspace's row references another workspace's decision.
Actor attribution is required. Every audit row has either an actor_id (user override of a decision) or an explicit actor_type = 'system' with a typed detail field.

For customers requiring cryptographic tamper-evidence, the Engine emits the audit row as a webhook event and the customer can chain the hashes externally, this is on the v2 roadmap (Section 14).

section 09

Webhook Delivery

Webhooks are the integration point for downstream systems, alerting, ticketing, account locking, secondary review queues. Each webhook subscribes to a subset of decision events and receives signed POST requests.

9.1 Signature

POST https://your-system.example.com/kirtonic-hook
Content-Type:           application/json
X-Kirtonic-Event:       decision.created
X-Kirtonic-Signature:   sha256=8f3c...e9
X-Kirtonic-Test:        true   # only on test fires from /webhooks/[id]/test

{
  "event":        "decision.created",
  "delivered_at": "2026-03-12T14:08:21.443Z",
  "data": {
    "signal":    { ... full signal row ... },
    "decision":  { ... full decision row ... },
    "evaluation":{ "severity": "high", "reason": "..." }
  }
}

The signature is HMAC-SHA256 of the raw request body using the webhook's signing_secret. The secret is generated server-side at webhook creation, returned once to the operator (and only once), and never exposed again via the API. Receiver verification:

import { createHmac, timingSafeEqual } from "crypto"

function verify(rawBody: Buffer, headerSig: string): boolean {
  const expected = createHmac("sha256", SECRET).update(rawBody).digest("hex")
  const a = Buffer.from(headerSig.replace("sha256=", ""), "hex")
  const b = Buffer.from(expected, "hex")
  return a.length === b.length && timingSafeEqual(a, b)
}

9.2 Delivery semantics

Delivery is fire-and-forget with a 10-second timeout. The v1 webhook does not retry on transient failure. This is by design: (a) decision events have latency-sensitive consumers and a queued retry window of hours is not useful to the downstream system; (b) at-least-once semantics would shift idempotency-handling cost to every receiver implementation. Failed deliveries write a webhook_failed audit event and update the webhook row's last_delivery_status. Operators can re-fire any decision's webhook from the UI.

We are evaluating exponential-backoff retry behind a per-webhook opt-in for v2, see Section 14.

section 10

Multi-provider Proxy

For workspaces using proxy mode (the chat playground; agent runtimes in production), the Engine fronts three foundation-model providers and any OpenAI-compatible endpoint. The proxy translates the request shape native to each provider.

Provider    Endpoint                                         Auth
─────────   ──────────────────────────────────────────────   ─────────────────
OpenAI      https://api.openai.com/v1/chat/completions       Bearer (sk-…)
Anthropic   https://api.anthropic.com/v1/messages            x-api-key (sk-ant-…)
Gemini      …generativelanguage.googleapis.com/v1beta/…      ?key=AIza…
Custom      <user-supplied>/v1/chat/completions              Bearer (optional)
  e.g. Ollama   http://localhost:11434/v1/chat/completions   (none, local)
       vLLM     http://gpu-host:8000/v1/chat/completions     Bearer (optional)

10.1 Native API translation

Each provider expects a slightly different request shape. The proxy normalises: messages with role: system become Anthropic's top-level system field; assistant messages become role: model in Gemini's contents array. The normalisation is implemented in callProviderChat() (src/lib/governance/provider-chat.ts) so each provider can be tested in isolation.

10.2 Key resolution order

For any given chat call the proxy resolves the API key in this strict order:

Active custom model's encrypted key (workspace_models.encrypted_api_key), for imported endpoints.
Inline key from the request body, useful for one-off testing without touching stored keys.
Workspace integration key for the selected provider (workspace_integrations.encrypted_key where provider = <chosen>).
Reject with HTTP 400 and { missing_integration: true } so the UI can prompt for setup.

For OpenAI-compatible local endpoints with no auth (the Ollama default) an empty key is accepted and no Authorization header is sent.

10.3 Host trace

Every proxy response includes a host_traceobject describing the data's journey:

host_trace: {
  kirtonic:   { hostname: 'kirtonic.io',       region: 'eu-west-2',         country: 'United Kingdom' },
  classifier: { hostname: 'api.anthropic.com', region: 'US (global edge)',  country: 'United States' },
  provider:   { hostname: 'api.openai.com',    region: 'US East',           country: 'United States' },
}

The trace is used by the playground's data-flow diagram to visualise where the message went on this specific turn. Region labels are derived from each provider's public infrastructure documentation and are not authoritative, individual requests may route to other edge locations within the same legal jurisdiction.

section 11

Custom Classifier Training

The Engine's default classifier is claude-sonnet-4-6. Customers with enough decision history can train a domain-specific replacement that inherits the policy without re-reading the entire system prompt on every call. Trained models live in the customer's own provider account.

11.1 Dataset construction

buildDataset(supabase, workspaceId) reads every settled decision in the workspace, joins the originating signal, and emits OpenAI chat fine-tuning JSONL:

{"messages":[
  {"role":"system","content":"You are a safety classifier tuned for one customer..."},
  {"role":"user","content":"{\"source\":\"playground/user\",\"context\":\"customer-support\",\"risk_score\":0.18,\"confidence\":0.82,\"metadata\":{...}}"},
  {"role":"assistant","content":"{\"label\":\"allow\",\"severity\":\"low\"}"}
]}

Labelling:

approved / auto_approved / executed → label "allow"
rejected → label "block"
awaiting_approval → excluded (no ground truth)
severity uses the current value, so reclassifications are honoured.

11.2 Training flow

POST /api/governance/models
  { name, base_model: "gpt-4o-mini-2024-07-18" }

→ build dataset JSONL                           # workspace_models.dataset_size
→ POST https://api.openai.com/v1/files          # external_file_id
→ POST .../v1/fine_tuning/jobs                  # external_job_id, status: 'queued'
→ poll job status via /[id]/status              # → 'training' → 'succeeded'
→ activate the trained model id                 # is_active: true (partial unique index
                                                #   enforces one-active-per-workspace)

11.3 Imported and OpenAI-compatible models

Workspaces can also import a model trained elsewhere, an existing OpenAI fine-tune (ft:gpt-4o-mini:org:tag:abc) or any OpenAI-compatible endpoint (Ollama, LM Studio, vLLM). Imported models go straight to status "succeeded" and can be activated immediately. The endpoint URL and optional per-model encrypted API key live in the workspace_models row.

section 12

Reclassification Feedback Loop

Fine-tuning a custom model is a hours-to-days operation. In the meantime, reviewer corrections need to take effect immediately. The Engine implements a second adaptation path: in-prompt feedback.

On every classification call, loadRecentReclassifications()pulls the workspace's most recent N (default 20) decisions where original_severity is not null, i.e. ones a reviewer corrected. The list is formatted into the classifier's system prompt as:

PAST HUMAN CORRECTIONS in this workspace, these are real
messages where a reviewer explicitly disagreed with the
classifier and changed the severity. Apply the SAME reasoning
to similar new messages: if you would have scored a message
like one of these at the classifier's original severity,
score it at the corrected severity instead. Look for semantic
similarity, not exact wording.

  - On the "playground/user" source [customer-support]:
    "I'm going to import company accounts" → reviewer
    reclassified from low to medium.
  - ...

This is the in-prompt version of few-shot learning. The classifier reads the corrections, recognises the semantic pattern, and re-scores similar incoming messages without any training cycle. The two adaptation paths combine: corrections are visible to the classifier on the next request, AND they become labelled training examples that get baked into a fine-tuned model on the next training run.

Bounded retrieval (N=20) keeps the prompt tractable. For workspaces with hundreds of reclassifications, the next iteration uses embedding-based retrieval to fetch only the corrections most semantically similar to the incoming message (Section 14).

section 13

Security Model

13.1 Tenant isolation

Every governance table, governance_signals, governance_decisions, governance_audit, governance_rules, governance_policy_rules, governance_webhooks, governance_playground_sessions, governance_playground_messages, workspace_integrations, and workspace_models, has Row-Level Security enabled with a policy of the form:

create policy "<...>: members read" on public.<table>
  for select using (public.is_workspace_member(workspace_id));

create policy "<...>: admins write" on public.<table>
  for all using (public.workspace_role(workspace_id) in ('owner','admin'));

is_workspace_member() and workspace_role() are SECURITY DEFINER functions that read workspace_members with elevated privilege, bypassing the RLS that would otherwise cause infinite recursion. This is the canonical Supabase pattern for multi-tenant RLS.

13.2 Credential encryption

Customer-supplied provider keys (OpenAI, Anthropic, Gemini) and per-model API keys are encrypted at rest using AES-256-GCM. The encryption key is read from process.env.INTEGRATION_ENCRYPTION_KEY, 32 bytes of hex, generated once at platform setup and stored only in the hosting provider's secret manager.

// src/lib/crypto/encrypt.ts (excerpted)
const ALGO = 'aes-256-gcm'

export function encryptToken(plaintext: string): string {
  const iv = randomBytes(12)
  const cipher = createCipheriv(ALGO, key(), iv)
  const ct = Buffer.concat([cipher.update(plaintext, 'utf8'), cipher.final()])
  const tag = cipher.getAuthTag()
  return `${iv.toString('hex')}:${tag.toString('hex')}:${ct.toString('hex')}`
}

The threat model for credential storage:

Threat                                Mitigation
────────────────────────────────────  ──────────────────────────────────────
Database dump (read-only attacker)    Ciphertext only; needs ENCRYPTION_KEY
ENCRYPTION_KEY leak via logs          Helper imports never log the key;
                                       error messages reference the env var
                                       name only
Encrypted blob mismatch (corruption)  GCM auth tag rejects on decrypt
Replay across workspaces              Workspace ID is part of the row's PK
                                       chain; RLS enforces ownership
Plaintext exposure via API response   POST returns only last_four; GET never
                                       returns ciphertext or plaintext

13.3 Workspace API keys

Workspace API keys are themselves credentials. They are issued as random 32-byte tokens prefixed with cw_live_, displayed plaintext exactly once at creation, and stored as a SHA-256 hash plus a 12-character prefix. The hash is used for verification; the prefix is used for display. Revocation is a soft delete (revoked_at timestamp) so the audit log retains attribution.

section 14

Threat Model

We enumerate the realistic adversaries and how the Engine responds. This is not exhaustive, it is the set of threats we have explicitly considered.

Adversary                       Capability                           Engine response
──────────────────────────────  ──────────────────────────────────  ─────────────────────────────────
End user of customer app        Crafts adversarial prompts          Classifier scores; rules engine
                                                                     blocks or queues
Compromised application key     Submits arbitrary signals            Workspace-scoped API key; RLS;
                                                                     audit trail captures everything
Insider with workspace access   Approves a high-risk decision        Audit row names them with
                                                                     timestamp; reclassification
                                                                     also audited
Insider with DB read-only       Reads workspace tables                Sees ciphertext for keys;
                                                                     plaintext content is in
                                                                     governance_signals.metadata
                                                                     (this IS sensitive, see below)
Network attacker                MITM between Kirtonic and provider    TLS everywhere; signed webhooks
                                                                     for downstream
Provider compromise             OpenAI/Anthropic key abuse           Keys are encrypted per workspace;
                                                                     revocable from Integrations
Rogue webhook receiver          Forges decision events                HMAC-SHA256 signature on every
                                                                     event; receiver MUST verify
Audit log tampering             Direct SQL UPDATE on audit table     RLS limits attack surface;
                                                                     application has no UPDATE path;
                                                                     cryptographic chain in v2

The insider-DB-read row is the most material residual risk: signal metadata may contain plaintext PII or message previews. Customers requiring this to be mitigated should: (a) configure the Trust Centre to redact PII at ingestion, (b) use a self-hosted deployment so the database lives in their own VPC, or (c) limit the signal payload they POST to non-sensitive identifiers and keep the message content in their own systems.

section 15

Performance, Scaling, Compliance

15.1 Latency

Typical end-to-end latency for a classifier-only turn:

signal validation + workspace resolve     5, 15 ms
load policy rules + reclassifications      20, 40 ms (single Postgres round-trip)
classifier call (Kirtonic hosted LLM)      300, 700 ms (network-bound)
rules engine + decision INSERT             10, 30 ms
audit row INSERT (batched 2 rows)          5, 15 ms
─────────────────────────────────────────  ──────────────
Total                                       ≈ 350, 800 ms

Proxy mode adds a second classifier call on the response (similar 350, 800 ms) plus the foundation-model latency (provider-dependent, 500, 3000 ms typical for short responses). Webhook delivery is fired async with void deliverWebhooks(...) and does not block the response.

15.2 Throughput

The hot path is dominated by the classifier call. Anthropic's default tier allows ~50 requests/sec/account. Workspaces approaching that limit should either (a) train and activate a custom classifier so the call goes to a dedicated fine-tune, (b) use classifier-only mode for low-risk surfaces and reserve full governance for regulated paths, or (c) request a dedicated rate-limit pool.

15.3 Compliance posture

Framework      Engine support                                       Customer responsibility
─────────────  ───────────────────────────────────────────────────  ───────────────────────────
UK GDPR        UK-region data residency option · audit log ·         DPIA · data-controller
               configurable retention                                 obligations
HIPAA          BAA available · PII / PHI redaction in pipeline ·     PHI classification on the
               encrypted at rest                                      customer's own data
PCI DSS        Card-number detection rule shipped in starter pack ·  Cardholder-data environment
               classifier scores card patterns 0.9+                   scope decisions
SOC 2          Tamper-evident audit · RLS · API key rotation         Vendor management ·
                                                                      employee access reviews
ISO 27001      Aligns with A.5 (policies), A.8 (asset mgmt),         ISMS · risk register
               A.12 (operations), A.16 (incident mgmt)

section 16

API Surface (v1)

The full public API. All endpoints accept either a workspace API key (Authorization: Bearer cw_live_…) with the appropriate scope, or a Supabase cookie session.

Endpoint                                              Method   Scope                   Notes
────────────────────────────────────────────────────  ──────   ─────────────────────   ────────────────
/api/v1/signals                                       POST     governance:write        Single entry point
/api/v1/decisions                                     GET      governance:read         Paginated list
/api/v1/decisions/[id]/approve                        POST     governance:write
/api/v1/decisions/[id]/reject                         POST     governance:write        Reason required
/api/v1/decisions/[id]/execute                        POST     governance:write        Fires webhook
/api/v1/decisions/[id]/reclassify                     POST     governance:write        Snapshots original
/api/v1/audit                                         GET      governance:read         Filterable

/api/governance/rules                                 GET,PUT  cookie (admin)
/api/governance/policy-rules                          GET,POST cookie (admin)          AI auto-categorise
/api/governance/policy-rules/[id]                     PATCH,DELETE cookie (admin)
/api/governance/webhooks                              GET,POST cookie (admin)
/api/governance/webhooks/[id]                         PATCH,DELETE cookie (admin)
/api/governance/webhooks/[id]/test                    POST     cookie (admin)
/api/governance/stats                                 GET      cookie

/api/governance/models                                GET,POST cookie (admin)          Train
/api/governance/models/import                         POST     cookie (admin)          Import existing
/api/governance/models/import/test                    POST     cookie (admin)          Probe endpoint
/api/governance/models/dataset                        GET      cookie (admin)          Download JSONL
/api/governance/models/[id]/status                    POST     cookie (admin)          Poll OpenAI job
/api/governance/models/[id]/activate                  POST     cookie (admin)          One per workspace
/api/governance/models/[id]/bundle                    GET      cookie (admin)          Portable JSON

/api/integrations                                     GET,POST,DELETE cookie (admin)   3 providers
/api/integrations/test                                POST     cookie (admin)

/api/governance/playground/chat                       POST     cookie                  In-platform proxy
/api/governance/playground/sessions                   GET,POST cookie
/api/governance/playground/sessions/[id]              GET,DELETE cookie

section 17

Future Work

Embedding-based reclassification retrieval.Move from "most recent N corrections" to "top-k semantically similar corrections." Requires per- workspace embedding index (pgvector candidate).
Cryptographic audit chain.Each audit row signs the previous row's hash; signed root published periodically. Customer can detect tampering externally.
Webhook retry with exponential backoff.Opt-in per webhook to avoid breaking customers who've built around at-most-once delivery.
Self-hosted deployment. Engine + Postgres + Anthropic key inside the customer VPC. Removes the Anthropic-as-classifier residual risk for customers who need it.
Streaming proxy. Forward provider responses to the client as a stream while running output classification on the assembled response, current implementation buffers the full response before classifying.
Multi-region active-active. Currently UK-primary. EU and US regions are next, with workspace-pinned region selection.