whattowhy — Independent Agent Assurance

what → why

The gap between these two words is a regulatory liability.

If you are a Head of Compliance, Risk, CCO or CRO at a PRA-regulated firm with AI agents in production, you already carry personal accountability for their decisions under SM&CR. Right now, you cannot independently evidence why those decisions were made. whattowhy is the only platform that produces that evidence continuously, traced to source data, structured for regulators, from a party with no stake in the outcome.

Request early access How it works →

Reasoning trace Live capture

Credit application · Agent: CreditUnderwriter-v2

Application declined

Ref #CU-2024-88421 · 14:32:07 GMT

Why, captured at decision moment

"Debt-to-income ratio of 4.8× exceeded SS1/23 §3.4 policy ceiling of 4.2×. Challenger model concurs. Escalation not required. Human review threshold not met."

Tamper-evident FCA-structured Independent SS1/23 aligned

The decision journey

Watch a decision move through whattowhy.

A credit agent declines an application. Without whattowhy, a log entry. With whattowhy, a complete evidential record — in under a second.

🤖

AI Agent

w→w

whattowhy

🔍

Trace

📊

Score

📑

Evidence

Step 01 — Decision made

Credit application declined by AI agent

agent: CreditUnderwriter-v2

action: DECLINE

timestamp: 14:32:07.441 GMT

ref: #CU-2024-88421

Without whattowhy, this is where the record ends. A log entry. No reasoning. No policy reference. Nothing a regulator can verify.

Step 02 — Captured independently

whattowhy intercepts at the integration layer

captured_by: whattowhy v1 — independent

agent_vendor_access: none

capture_mode: live — not reconstructed

tamper_evident: true

Captured at the integration layer — not inside the agent, not from the vendor's logs. Structurally independent from the system being assessed.

Step 03 — Decision traced to source

Full reasoning chain extracted

data_inputs: income £38,200 · DTI 4.8×

policy_applied: SS1/23 §3.4 — DTI ceiling 4.2×

interpretation: threshold exceeded · mandatory decline

source_data_ref: Open Banking feed · 14:31:52 GMT

The full chain: what data the agent saw, what policy it applied, how it interpreted that policy in this specific context — traced all the way to source.

Step 04 — Scored by risk materiality

Materiality tier assigned, challenger model run

materiality_tier: HIGH — credit decision

compliance_score: 94 / 100

challenger_model: CONCURS — same outcome

human_escalation: not required

High-risk decisions get full trace. The challenger model scores the alternative paths the agent could have taken, confirming this was the best available decision.

Step 05 — Evidence pack produced

Regulator-legible evidence, ready immediately

format: FCA Consumer Duty · SS1/23 aligned

legible_to: compliance officer · not just engineer

produced_by: whattowhy — no vendor stake

status: ready to present to FCA ✓

Structured for regulators. Queryable. Tamper-evident. When the FCA asks, this is what you hand them — produced by a party with no stake in the outcome.

1 of 5

vs. Alternatives

Not governance. Not security. Continuous Assurance.

Tools give you telemetry. Advisors give you frameworks. Neither produces the independent, regulator-legible evidence that proves the right decision was made, under the right policy, at the right moment.

Capability

Market today

whattowhy

Produces evidence, not just telemetry

✗ Telemetry only

✓ Regulator-legible

Traces decisions to source data

✗ Output logs

✓ Full chain

Structurally independent of agent vendor

✗ Conflict of interest

✓ Permanently

Continuous in-life monitoring

✗ Point-in-time

✓ Real time

Risk scoring by materiality tier

✗ Binary alerts

✓ SS1/23 aligned

Independent of the firm advising you

✗ Built by advisors

✓ No advisory stake

How it works

An independent manager agent for your AI agents.

Five steps. Each valuable alone. Each making the previous step more powerful over time.

Sits outside your agent stack

Integrated at the layer between your agents and the systems they access, not inside the agent itself. No vendor partnership. No internal access. Structurally independent from the system being assessed. The same vendor cannot provide the agent and the assurance of it, ever.

Independent

Capture the why at the moment of decision

What information the agent had. What policy it applied. How it interpreted that policy in context. Why it acted as it did. Reasoning traced all the way back to source data. Captured live, not reconstructed after the fact.

Capture

Score and produce evidence that holds up

Each decision is scored by risk materiality, exactly as SS1/23 requires. High-risk decisions get full trace and human escalation. The output is tamper-evident, queryable, and structured for FCA, PRA and Consumer Duty — legible to a compliance officer, not just an engineer. Every decision added to the corpus makes the next phase more powerful.

Prove

Surface risk before it materialises

As the corpus grows across your agent fleet, whattowhy identifies which agents are drifting toward bad outcomes before they appear in a log. Which decision patterns are accumulating tail risk across hundreds of low-materiality calls that each look fine individually. Which use cases pose a greater risk or have a lower level of control. Not reactive. Predictive.

Predict

Turn evidence into better policy

The corpus identifies where policy is ambiguous, where agents consistently misinterpret intent, and where outcomes are diverging from what the policy was designed to produce. These surface as structured policy recommendations — turning operational evidence into continuous governance improvement, not a static rulebook reviewed once a year.

Improve

Why whattowhy

The only platform that moves from evidence to prediction.

Every phase is valuable on its own. Together they compound into something no point-in-time audit or telemetry dashboard can replicate.

01 — Foundation

We trace every decision to source data

Every other tool captures telemetry — what your agent did. whattowhy captures the full decision chain: what information the agent had, what policy it applied, how it interpreted that policy in context, and why it acted as it did, traced all the way back to source data. Captured live at the moment of decision. We also run challenger models against each decision, scoring the alternative paths the agent could have taken — giving you a measurable view of not just whether agents followed policy, but whether they made the best available decision. This is the only form of evidence a regulator will accept. And it is the corpus that powers everything that comes next.

Full audit trail

We surface risk before it materialises

As the decision corpus grows, whattowhy identifies which agents are drifting toward bad outcomes before they appear in a log. Which decision patterns are accumulating tail risk across hundreds of calls that each look fine individually. Which use cases have a lower level of control than your risk appetite allows. Not reactive monitoring. Predictive risk intelligence, answering the question regulators are starting to ask: do you know where your risk is concentrating before something goes wrong?

Predictive risk

We are structurally independent

whattowhy sits outside the agent stack, with no vendor partnerships and no internal access. The same entity cannot produce the AI agent and the independent assurance of it. Architecturally identical to external audit. Any firm that also provides the agents, the infrastructure, or the advisory wrapper has a conflict of interest that no policy can resolve.

Structural independence

The corpus turns evidence into better policy

Over time the decision corpus identifies where policy is ambiguous, where agents consistently misinterpret intent, and where outcomes are diverging from what the policy was designed to produce. These surface as structured recommendations, turning operational evidence into continuous governance improvement rather than a static rulebook reviewed once a year.

Policy intelligence