Technical Architecture

Designing a Compliant LLM Stack for Regulated Environments (Banking & Insurance)

A practical reference architecture for retrieval, redaction, oversight and audit logging—mapped to recognised frameworks so you can ship useful features without creating compliance drag.

By Lewis Cross·
Designing a Compliant LLM Stack for Regulated Environments (Banking & Insurance)

What "compliant" really means in practice

In regulated firms, "compliant" is not a buzzword—it is being able to show a supervisor how an answer was produced, by whom, with what data, and what controls would stop it going wrong. That has two parts: a sensible architecture (ingest → index → retrieve → generate → evaluate) and an operating model that turns evidence into habit (ownership, logging, incident handling, change control). The language may vary across frameworks, but the through-line is consistent: govern the system across its lifecycle, measure risks, and manage them with documented controls. The NIST AI Risk Management Framework remains the most widely referenced way to structure this (govern → map → measure → manage), and its generative-AI companion profile adds concrete eval ideas for modern LLMs. (NIST)

The shape of a good LLM stack (without the hype)

A workable stack tends to look like this:

  • You ingest content through a secure pipeline, applying data minimisation at the door—mask or tokenise sensitive fields as early as you can.
  • You index content with access controls baked in (row/attribute filters, not just "private" vs "public"), and you keep lineage from source document to chunk so you can trace citations later.
  • You retrieve with policy-aware filters; hybrid search (text + vectors) usually wins over vectors alone for regulated text.
  • You generate with stable prompt templates and, where possible, tool use for structured tasks (e.g., lookups), so answers are less free-form and easier to justify.
  • You evaluate continuously: grounding, refusal on off-limits requests, privacy "no-leak" tests, and basic red-teaming.

None of this is exotic—Microsoft's public RAG guidance, for example, spells out similar preparation, chunking, retrieval and evaluation choices; the regulated twist is that each choice must leave an audit trail. (Microsoft Learn)

Privacy and governance are design inputs, not afterthoughts

If your use case touches personal data, the UK ICO's long-standing line still applies: you are accountable for both compliance and being able to demonstrate it. That pushes you towards a few sensible defaults: document "who is controller/processor" for each component; run a DPIA before you ship; and retain just enough log data to reconstruct a decision without hoarding unnecessary personal data. If you align this to an AI management system such as ISO/IEC 42001, you get an organisational backbone for policies, roles, training and continual improvement that sits above any one product team. (Information Commissioner's Office)

Security: the unglamorous work that saves you later

Threats around LLMs are new in flavour but not in principle: prompt injection, retrieval poisoning, data exfiltration through chat, compromised plugins or tools. ENISA and ANSSI both caution that you should treat LLM components like any Internet-facing service: constrain inputs, sanitise what reaches the model, limit outbound calls, and log guardrail hits. Add adversarial prompts to your test suite and rehearse what you will switch off first during an incident (for example, disable free-text tool use; fall back to read-only retrieval). (ENISA)

A bank example: collections support that stands up to audit

Picture a retail bank assistant that drafts hardship responses. The compliant design is boring by intention:

  • The assistant can only read from a curated retrieval corpus of policy manuals, fee tables and template letters.
  • Every sentence in a draft is backed by a citation to that corpus; agents see and can click the sources.
  • Prompts, model version, and the retrieval corpus ID are logged with the draft, alongside the agent's approval or edits.
  • Privacy controls sit at ingest (masking names, IBANs) and at log time (hash or drop obvious identifiers).

Under the hood, this maps cleanly to NIST AI RMF's lifecycle—governance and mapping are your documented scope and data notes; measurement is your evaluation harness; manage is your human-in-the-loop, incident playbook and change control. When audit asks "why did it say X?", you can replay the prompt, show the citations, and show the human approval. (NIST)

An insurer example: underwriting summaries you can defend

For commercial lines, underwriters want a clean summary of a broker submission plus relevant exclusions from policy wordings. The compliant path is to avoid "open-ended chat": use a task prompt that extracts fields (industry, turnover, locations) and pulls clauses from approved wordings, then renders a structured summary. Because every clause comes from an internal library, your retrieval layer can record the exact paragraph IDs used. If quality dips, you can roll back to a previous prompt template because prompts are versioned like code. That single habit—version prompts and retrieval filters—solves many change-management headaches when committees get involved. (Microsoft Learn)

Logging that helps compliance without over-collecting

Good logs answer three questions: what input the model saw, which context was retrieved, and what controls fired. You do not need to store raw chat for ever; store a minimal reconstruction packet instead: prompt template version, anonymised input hash, retrieval corpus and chunk IDs, model and guardrail versions, citations returned, and the final answer. That is usually enough for internal audit and supervisors, and it keeps you on the right side of data minimisation principles. The ICO's accountability guidance explicitly favours this "show your working" approach. (Information Commissioner's Office)

How to ship this without freezing delivery

A lot of teams get moving with a two-track plan. Track A hardens the platform: centralise redaction, retrieval, evaluation and logging so product teams do not reinvent controls. Track B focuses on one pilot that matters to the business and makes the end-to-end story legible (system card, data notes, evaluation results, oversight roles, incident playbook). You can layer organisational scaffolding—policy, roles, training—using ISO/IEC 42001 or an equivalent AIMS, but the credibility still comes from a real system in production with evidence you can point to. (ISO)

Need a compliant LLM blueprint?

We run a one-day architecture workshop to leave you with: a documented reference design for your stack, a thin evaluation harness you can plug into CI, a privacy-aware logging pattern, and a change-control flow for prompts and retrieval. From there, we can help your team ship the first production use case with the evidence risk and audit expect to see.

Book a free consultation


Sources

  • NIST AI Risk Management Framework (AI RMF 1.0) and Generative-AI profile: lifecycle functions and evaluation considerations for modern LLM systems. (NIST)
  • Microsoft Azure Architecture Center (RAG): practical guidance on preparation, chunking, retrieval and evaluation choices. (Microsoft Learn)
  • ISO/IEC 42001: requirements for an AI management system to anchor policies, roles and continual improvement. (ISO)
  • UK ICO – AI and Data Protection (Accountability & Governance): controller/processor clarity, DPIAs, demonstrating compliance. (Information Commissioner's Office)
  • ENISA & ANSSI guidance: security threats and hardening recommendations for AI/LLM systems. (ENISA)

Ready to implement this in your organization?

We help financial services companies build compliant AI systems with governance built in.