The gap between these two words is a regulatory liability.
Your AI agents are making consequential decisions in credit, fraud, KYC, you name it. Engineering logs tell you what happened, not why. When the FCA asks, you need independent evidence produced by a party with no stake in the outcome.
of the world's largest banks are already piloting AI agents in live production workflows.
IIF-EY, 2025enterprises has a mature governance model for the autonomous AI agents they are deploying.
Deloitte, 2025vendors today offer truly independent evidence of why an AI agent made a consequential decision.
Market analysis, 2026Every existing solution answers a different question for a different buyer. None answers the one the regulator actually asks.
"What is my AI system doing? Is it behaving safely? What did it output?"
"Why did my agent make that decision, and can I prove it was the right one to a regulator?"
Modelled on how HR manages, develops, and holds the human workforce accountable. whattowhy does the same for your AI agents, from outside the stack, with no stake in the outcome.
Integrated at the layer between your agents and the systems they access, not inside the agent itself. No vendor partnership. No internal access. Structurally independent from the system being assessed. The same vendor cannot provide the agent and the assurance of it, ever.
What information the agent had. What policy it applied. How it interpreted that policy in context. Why it acted as it did. Reasoning traced to the data source. Captured live, not reconstructed after the fact.
Each agent receives a continuous compliance score. Risk tiered by materiality, exactly as SS1/23 requires. High-risk agents in credit and KYC get full trace and human escalation. Medium-risk get score and summary. Low-risk get aggregated scoring.
Tamper-evident, queryable, structured for FCA, PRA and Consumer Duty. Legible to a compliance officer. Produced by a party with no stake in the outcome. Embeds into SS1/23 self-assessments and board model risk reports.
Firms must assess, test, understand and evidence the outcomes their AI systems deliver to customers."FCA Consumer Duty, active enforcement, 2024-2026
Not incrementally better than what exists. Structurally different in ways that cannot be replicated by the existing market.
Every other tool captures outputs — what your agent did. whattowhy captures reasoning: what information the agent had, what policy it applied, how it interpreted that policy in context, and why it acted as it did. Captured live at the moment of decision, not reconstructed after the fact. This is the only form of evidence a regulator will accept.
The manager agent runs challenger models against each decision, scoring the alternative paths the agent could have taken. Over time, you get a measurable view of not just whether agents followed policy, but whether they made the best available decision. Automated challenger modelling, running continuously.
whattowhy sits outside the agent stack, with no vendor partnerships and no internal access. The same entity cannot produce the AI agent and the independent assurance of it. Architecturally identical to external audit. Any firm that also provides the agents or the infrastructure has a conflict of interest that no policy can resolve.
As the manager agent accumulates decision data across your agent fleet, it identifies where policy is ambiguous, where agents consistently drift, and where outcomes diverge from intent. It surfaces these as structured policy recommendations, turning operational evidence into continuous governance improvement rather than a static rulebook.
Working with a small number of design partners at PRA-regulated firms. No pitch, just a conversation about what good evidence needs to look like.
Pre-revenue · Pre-product · Active discovery · Built through Antler London 2026