Applying SR 11-7 to AI Agents: a practical framework.
Twelve pages on how the Federal Reserve's SR 11-7 model risk management guidance applies to the LLM-based AI agents your business units are now deploying. Written for Chief Model Risk Officers, Heads of Model Validation and AI Governance leads at banks $10B+ AUM.
Written by Ashish K. Saxena · Founder, Caventia
The principles hold up. The artifacts don't.
SR 11-7 was written in 2011, three years before GPT-2 and seven years before the term “AI agent” entered common usage. Today, banks are deploying AI agents in fraud detection, KYC adjudication, credit underwriting and customer service while still wrestling with whether traditional model risk frameworks apply.
The honest answer: SR 11-7's principles hold up better than you'd expect. The artifacts and workflows it implies break down quickly when applied to LLM-based agents. An LLM-based fraud detector that produces a fraud_score from a transaction prompt fits squarely inside SR 11-7's model definition. So does an AI agent that synthesizes a KYC verdict from multiple data sources. The Fed has confirmed this position in supervisory letters and OCC examiner training materials throughout 2024 and 2025.
Yet many banks are still treating AI agents as “automation” rather than as models. This creates two risks. First, examiner findings: the OCC and Fed increasingly ask explicitly about AI/ML model governance, and banks without an answer face Matters Requiring Attention. Second, disparate impact exposure: AI agents making credit-adjacent decisions without ECOA-compliant validation are creating CFPB enforcement risk.
The whitepaper organizes the response in three layers. The three pillars of SR 11-7 (robust development, independent validation, ongoing monitoring) translate to AI agents without modification. The breakdowns happen in five specific places: non-determinism, prompt-as-feature, tool use, model provider opacity and continuous capability evolution. The fix is a five-step practical framework: inventory and classify; document each agent; validate before deployment; capture production decisions; monitor and re-validate.
The opportunity: banks that build AI agent governance correctly in 2026 have a twelve to twenty-four month head start on competitors who will be forced to retrofit it under examiner pressure. [Download to continue reading.]
Eleven sections and two appendices.
About four thousand eight hundred words. Roughly twelve pages once typeset.
I.
Why SR 11-7 Still Matters in 2026
The Fed has confirmed in supervisory letters throughout 2024 and 2025 that LLM-based decisions fit squarely inside SR 11-7's model definition. Banks treating AI agents as automation face MRA findings.
II.
Where the framework maps cleanly
SR 11-7's three pillars (development, validation, ongoing monitoring) translate to AI agents without modification. The principles are durable. The artifacts are not.
III.
Five places it breaks down
Non-determinism. Prompt-as-feature. Tool use and emergent behavior. Model provider opacity. Continuous capability evolution. Each gets a specific fix.
IV.
A five-step practical framework
Inventory and classify. Document each agent. Validate before deployment. Capture production decisions. Monitor and re-validate. Pressure-tested against examiner conversations.
V.
Documentation artifacts you need
Per-agent (Model Identity Document, validation reports, monitoring history, exception log). Program-level (inventory, policy, independence policy, provider risk assessment).
VI.
Architectural requirements for capture
Reproducibility. Tamper evidence. Independence from agent operator. Retention. Demographic capture for ECOA. Replay queries. Minimum bars for examiner-credible evidence.
VII.
Validation for non-deterministic systems
Behavioral envelope testing. Adversarial test suites. Disparate impact analysis on balanced corpora. Shift from accuracy-on-test-set to envelope stability.
VIII.
Seven pitfalls banks are making in 2026
Treating AI as automation. Documenting the LLM as the model. Validating once, never again. No demographic capture. Capture in logs engineers can modify. Among others.
IX.
Implementation roadmap
Days 1-90 (inventory, classify, pattern build). Days 91-180 (capture layer rollout, monitoring). Days 181-365 (full coverage, re-validation cycle).
X.
Independence: the quiet advantage
Counterintuitive: AI agents make independence requirements easier to satisfy than traditional models. Validators no longer need scarce quant PhDs.
XI.
Provider risk: a specific concern
LLM providers ship updates that meaningfully change agent behavior. Contractual, operational, strategic and documentary controls for managing the dependency.
App.
Appendices
SR 11-7 to AI agent mapping table. Glossary covering agent, capture layer, decision drift, feature snapshot, hash chain, Merkle root, MRA, prompt drift and tamper evidence.