Skip to main content

Case study · banking

Self-hosted commercial credit AI platform for a $1M-$100M property lender

How Thoughtwave built a 100% self-hosted commercial property loan platform with a 3-model AI ensemble, zero external API calls, MBE/GSA-ready procurement.

-50%

Underwriting cycle time

indicative, v2.0 rollout

0

External API calls

ongoing

3

AI ensemble models running locally

Qwen 2.5, Gemma 27B, Llama 3.3 70B

Yes

MBE/GSA procurement-ready

day one

Context

A commercial property lender operating in the $1M-$100M deal range needed to modernize an underwriting pipeline that depended on a mix of spreadsheets, email attachments, and manual stage handoffs. The firm's credit committee, underwriters, closers, and servicing team each worked in different tools, and the loan file accumulated inconsistencies stage-to-stage. The firm wanted AI across the pipeline — but not the vendor AI on the market. Two constraints drove the engagement scope:

  1. Zero external data leakage. The loan file contains borrower PII, property collateral detail, and proprietary risk scoring. None of it could be sent to an external LLM API.
  2. Procurement posture. The firm operates in jurisdictions where MBE supplier-diversity credit and GSA procurement paths materially improve deal economics. Vendor AI platforms that do not qualify are effectively off the table.

Challenge

Building an agentic AI platform that operates entirely inside the client's infrastructure is not a drop-in exercise. It requires a model layer (which local LLMs, at what scale), a platform layer (orchestration, memory, retrieval), a workflow layer (the 9 stages of a commercial credit lifecycle), a role and access layer (5 user roles with stage-gated permissions), and the operational layer (observability, audit, evaluation) — all running on GPUs the client owns.

The specific technical constraints:

  • No external model APIs. Every model call executes locally. No OpenAI, no Anthropic, no Google — only models running under the client's control.
  • Regulated deal scope. Loans in this range require audit trails that withstand examiner and auditor review. Every AI decision, input, retrieval, and output must be captured with immutable logging.
  • Credit committee workflow. A loan cannot move forward without explicit role sign-offs. The platform had to enforce that — not merely advise.

Approach

Thoughtwave deployed TWSS Commercial Credit AI — our production commercial lending AI platform, running on a 14-service Docker Compose stack: Postgres, Redis, MinIO/S3, Ollama for local model serving, FastAPI services for orchestration, and a Next.js web UI. The AI layer runs three locally-hosted models (Qwen 2.5, Gemma 27B, Llama 3.3 70B) as an ensemble, routed per underwriting sub-task.

The engagement followed a three-stage arc:

  • Discovery (4 weeks). Mapped the 9-stage workflow, the 5 user roles, and the decision points that required human sign-off versus AI automation. Documented the audit artifacts needed per stage.
  • Platform build (8 weeks). Stood up the Docker stack, the three-model ensemble via Ollama, the retrieval and memory layer, and the audit log pipeline. Built the first two workflow stages end-to-end as the platform proof.
  • Workflow rollout (ongoing). Shipped the remaining stages in sequence: intake, underwriting, risk scoring, committee review, close, funding, servicing. Each stage has its own evaluation suite, its own role gates, and its own audit artifacts.

What we built

The production system comprises six subsystems:

  1. Intake and document capture. Borrower documents arrive via email or secure upload; OCR and LLM extraction populate structured fields; exceptions route to a human.
  2. Underwriting AI ensemble. The three-model router calls Qwen 2.5 for structured extraction, Gemma 27B for narrative analysis, and Llama 3.3 70B for the more complex risk-scoring reasoning.
  3. Stage-gated workflow engine. The 9-stage pipeline enforces role permissions — an underwriter cannot close, a closer cannot approve, a servicer cannot re-underwrite.
  4. Immutable audit log. Every AI call (prompt, context, tool, response, confidence) is captured to append-only storage with retention matched to regulatory obligations.
  5. Credit committee UI. Reviewers see the AI's recommendation, the evidence behind it, and the confidence score — and approve, deny, or send back for rework with a documented reason.
  6. Evaluation and drift monitoring. Production traces feed a continuous evaluation pipeline; degradation on any sub-task triggers an alert and a human review.

Outcomes

Indicative outcomes from comparable self-hosted lending deployments:

  • ~50% faster underwriting cycle time. Document extraction and narrative drafting that previously took hours per file now complete in minutes of AI work plus minutes of human review.
  • Zero external API calls. The entire AI stack runs on the client's GPU infrastructure. No borrower data leaves the environment.
  • MBE/GSA procurement vehicle. The platform's architecture and Thoughtwave's supplier-diversity credentials make it eligible for procurement paths many AI vendors cannot meet.
  • 3-model ensemble running locally. The client owns the model weights, the orchestration code, and the audit log — no vendor lock-in.

What's next

The next phase extends the platform to adjacent lending products (construction, bridge, portfolio) using the same infrastructure. The client is also evaluating adding a borrower-portal copilot layered on the TWSS CS Agent pattern — using the same self-hosted model stack — so borrowers can check deal status and get compliance-aware answers without the firm exposing data to an external AI service.

For deeper context on how we approach self-hosted AI in regulated environments, see our AI & Generative AI service and our work with Banking & Finance clients.

Frequently asked questions

Why was self-hosting a requirement?
The client's loans range from $1M to $100M per deal and include borrower PII, financials, and property collateral documentation. Sending any of that to an external LLM API creates regulatory, contractual, and competitive exposure the client was not willing to accept. Self-hosting — running the full model stack on the client's infrastructure — was the precondition for the engagement.
What is the 3-model AI ensemble and why not a single model?
The platform runs Qwen 2.5, Gemma 27B, and Llama 3.3 70B as an ensemble. Different models perform better on different underwriting sub-tasks (document extraction, risk scoring, narrative generation). Routing each step to the best-fit model improves quality and caps the cost of any single model misfire. All three run on the client's own GPU infrastructure via Ollama.
How is this MBE/GSA-ready?
Thoughtwave is a certified Minority-Owned Business Enterprise and GSA Schedule holder. The platform's self-hosted architecture and supplier-diversity posture make it eligible for procurement vehicles that many AI vendors cannot meet — a material advantage for federal and supplier-diversity-driven deployments.
How long did the initial deployment take?
v1.0 shipped in approximately 16 weeks across discovery, platform build, model selection, and one full loan stage live. v2.0 added a full 9-stage workflow, 5 user roles with workflow gating, and the 14-service Docker Compose stack that operates the production system.

Related resources

RT
Ramesh Thumu

Founder & President, Thoughtwave Software

Reviewed by Thoughtwave Editorial

Last updated April 22, 2026