Context
A commercial property lender operating in the $1M-$100M deal range needed to modernize an underwriting pipeline that depended on a mix of spreadsheets, email attachments, and manual stage handoffs. The firm's credit committee, underwriters, closers, and servicing team each worked in different tools, and the loan file accumulated inconsistencies stage-to-stage. The firm wanted AI across the pipeline — but not the vendor AI on the market. Two constraints drove the engagement scope:
- Zero external data leakage. The loan file contains borrower PII, property collateral detail, and proprietary risk scoring. None of it could be sent to an external LLM API.
- Procurement posture. The firm operates in jurisdictions where MBE supplier-diversity credit and GSA procurement paths materially improve deal economics. Vendor AI platforms that do not qualify are effectively off the table.
Challenge
Building an agentic AI platform that operates entirely inside the client's infrastructure is not a drop-in exercise. It requires a model layer (which local LLMs, at what scale), a platform layer (orchestration, memory, retrieval), a workflow layer (the 9 stages of a commercial credit lifecycle), a role and access layer (5 user roles with stage-gated permissions), and the operational layer (observability, audit, evaluation) — all running on GPUs the client owns.
The specific technical constraints:
- No external model APIs. Every model call executes locally. No OpenAI, no Anthropic, no Google — only models running under the client's control.
- Regulated deal scope. Loans in this range require audit trails that withstand examiner and auditor review. Every AI decision, input, retrieval, and output must be captured with immutable logging.
- Credit committee workflow. A loan cannot move forward without explicit role sign-offs. The platform had to enforce that — not merely advise.
Approach
Thoughtwave deployed TWSS Commercial Credit AI — our production commercial lending AI platform, running on a 14-service Docker Compose stack: Postgres, Redis, MinIO/S3, Ollama for local model serving, FastAPI services for orchestration, and a Next.js web UI. The AI layer runs three locally-hosted models (Qwen 2.5, Gemma 27B, Llama 3.3 70B) as an ensemble, routed per underwriting sub-task.
The engagement followed a three-stage arc:
- Discovery (4 weeks). Mapped the 9-stage workflow, the 5 user roles, and the decision points that required human sign-off versus AI automation. Documented the audit artifacts needed per stage.
- Platform build (8 weeks). Stood up the Docker stack, the three-model ensemble via Ollama, the retrieval and memory layer, and the audit log pipeline. Built the first two workflow stages end-to-end as the platform proof.
- Workflow rollout (ongoing). Shipped the remaining stages in sequence: intake, underwriting, risk scoring, committee review, close, funding, servicing. Each stage has its own evaluation suite, its own role gates, and its own audit artifacts.
What we built
The production system comprises six subsystems:
- Intake and document capture. Borrower documents arrive via email or secure upload; OCR and LLM extraction populate structured fields; exceptions route to a human.
- Underwriting AI ensemble. The three-model router calls Qwen 2.5 for structured extraction, Gemma 27B for narrative analysis, and Llama 3.3 70B for the more complex risk-scoring reasoning.
- Stage-gated workflow engine. The 9-stage pipeline enforces role permissions — an underwriter cannot close, a closer cannot approve, a servicer cannot re-underwrite.
- Immutable audit log. Every AI call (prompt, context, tool, response, confidence) is captured to append-only storage with retention matched to regulatory obligations.
- Credit committee UI. Reviewers see the AI's recommendation, the evidence behind it, and the confidence score — and approve, deny, or send back for rework with a documented reason.
- Evaluation and drift monitoring. Production traces feed a continuous evaluation pipeline; degradation on any sub-task triggers an alert and a human review.
Outcomes
Indicative outcomes from comparable self-hosted lending deployments:
- ~50% faster underwriting cycle time. Document extraction and narrative drafting that previously took hours per file now complete in minutes of AI work plus minutes of human review.
- Zero external API calls. The entire AI stack runs on the client's GPU infrastructure. No borrower data leaves the environment.
- MBE/GSA procurement vehicle. The platform's architecture and Thoughtwave's supplier-diversity credentials make it eligible for procurement paths many AI vendors cannot meet.
- 3-model ensemble running locally. The client owns the model weights, the orchestration code, and the audit log — no vendor lock-in.
What's next
The next phase extends the platform to adjacent lending products (construction, bridge, portfolio) using the same infrastructure. The client is also evaluating adding a borrower-portal copilot layered on the TWSS CS Agent pattern — using the same self-hosted model stack — so borrowers can check deal status and get compliance-aware answers without the firm exposing data to an external AI service.
For deeper context on how we approach self-hosted AI in regulated environments, see our AI & Generative AI service and our work with Banking & Finance clients.