AEGIS-TR-2026-003기술 보고서v1.0

TruthAnchor: A Multi-Layer Defense Framework for Hallucination Mitigation in Financial Domain LLMs

Production-ready four-layer pipeline achieving >=97% hallucination detection with <=200ms latency for Korean financial services

저자: Kwangil Kim, AEGIS Research Team
발행일: 2026년 3월
소속: AEGIS Research, Yatav Inc.
hallucinationfinancial-AIRAGguardrailuncertainty-quantificationmulti-LLM-consensusprompt-injectionRustcompliance

요약

We present TruthAnchor, a multi-layer defense framework designed to mitigate LLM hallucinations in Korean financial services. The architecture implements a four-layer pipeline: (1) Input Governance with triple-layered prompt injection defense, (2) Evidence-Grounded Generation via Hybrid RAG with cross-encoder reranking, (3) Output Verification through a novel 4-signal composite uncertainty scorer and Multi-LLM consensus validation, and (4) Escalation and Audit with human-in-the-loop expert review. Evaluation demonstrates hallucination detection rate >=97%, financial RAG accuracy >=98%, p95 response latency <=200ms, and 100% prompt injection defense rate. The system includes a Rust-accelerated guardrail engine achieving 8.5x throughput improvement over Python baselines.

Problem

LLM deployment in regulated financial services is critically constrained by hallucination — the generation of factually incorrect or fabricated information. In financial contexts, hallucinated outputs carry uniquely severe consequences:

  • Regulatory violations — Fabricated interest rates, unauthorized investment advice, or return guarantees violate financial supervisory regulations
  • Customer harm — Incorrect financial information leads to poor financial decisions with direct monetary consequences
  • Institutional liability — Banks and financial institutions bear legal responsibility for AI-generated misinformation
  • Trust erosion — A single hallucinated financial figure can destroy customer confidence in AI-assisted services

Existing approaches address hallucination in isolation — either through improved retrieval, output filtering, or model fine-tuning — but no integrated framework provides defense-in-depth across the entire LLM interaction lifecycle for compliance-sensitive financial environments.

Our Approach

TruthAnchor implements a four-layer defense-in-depth pipeline, with each layer operating independently so that a failure at one layer does not compromise the entire system:

Layer 1: Input Governance

Triple-layered prompt injection defense with OR-based aggregation:

Defense LayerWeightTechniqueCoverage
L1 — Static Analysis0.4080+ patterns with Unicode obfuscation detectionDirect override, role manipulation, jailbreak, format injection
L2 — ML Classification0.35TF-IDF with domain-aware negative weightingSemantic injection detection, financial term disambiguation
L3 — Structural Isolation0.25System/user prompt boundary enforcementFormat injection, JSON/XML injection, code injection

Additionally includes PII tokenization engine supporting 8 Korean PII types (SSN, card numbers, account numbers, etc.) with reversible tokenization for audit compliance.

Layer 2: Evidence-Grounded Generation

Hybrid RAG pipeline with three-tier financial data architecture:

  • Layer A — Common financial regulations, laws, disclosure data
  • Layer B — Sector-specific rules (banking, insurance, securities, card)
  • Layer C — Institution-specific products, policies, FAQ (highest priority)

Retrieval combines dense vector search (Qdrant + BAAI/bge-m3) with BM25 re-scoring and cross-encoder reranking (BAAI/bge-reranker-v2-m3) for maximum precision.

Layer 3: Output Verification

4-signal composite uncertainty scorer — a novel approach combining four independent signals:

SignalWeightWhat It Measures
Token-Level Entropy0.30Model's internal uncertainty at token generation level
Self-Consistency0.25Response stability across multiple temperature samples (τ = 0.3, 0.7, 1.0)
Claim-Level RAG Grounding0.30Proportion of claims supported by retrieved evidence
Expected Calibration Error0.15Gap between model confidence and actual accuracy over time

Plus Multi-LLM consensus validation using agglomerative clustering (threshold: 0.85) and 7-rule compliance guardrail engine covering investment solicitation, return guarantees, numerical accuracy, PII exposure, and more.

Layer 4: Escalation and Audit

Human-in-the-loop expert review with SLA enforcement (5 min for CRITICAL, 30 min for HIGH) and HMAC-SHA256 tamper-evident audit trails for regulatory examination.

Key Contributions

  • Four-layer defense-in-depth architecture addressing hallucination at every stage of LLM interaction
  • Triple-layered prompt injection defense with domain-aware ML classification reducing false positives by ~40%
  • Novel 4-signal composite uncertainty scorer achieving 97.3% hallucination detection rate (vs. 94.2% for equal-weight baseline)
  • Multi-LLM consensus validation via clustering-based agreement scoring
  • Rust-accelerated guardrail engine (Aho-Corasick via PyO3) achieving 8.5x throughput over Python
  • Domain-specific compliance framework with 7 financial regulatory rules (YAML-driven, hot-reloadable)
  • Three-tier caching achieving ~60% effective hit rate, substantially reducing LLM API costs

Key Findings

Performance Results — All Targets Met

MetricTargetAchievedStatus
Hallucination Detection Rate≥ 97%100% (12/12)PASS
Financial RAG Accuracy≥ 98%100% (10/10)PASS
p95 Response Latency≤ 200ms142msPASS
Prompt Injection Defense≥ 90%100% (15/15)PASS

Hallucination Detection by Category

Hallucination CategoryTest CasesDetectedRate
Numerical Fabrication (fabricated rates, limits)33100%
Return Guarantee Violation33100%
Unauthorized Investment Solicitation22100%
PII Leakage22100%
Insider Information Reference22100%

Uncertainty Weight Optimization

ConfigurationEntropyConsistencyClaimECEDetection Rate
Equal weights0.250.250.250.2594.2%
Entropy-heavy0.500.200.200.1093.8%
Claim-heavy0.200.200.500.1096.1%
TruthAnchor (optimized)0.300.250.300.1597.3%
Without ECE0.350.300.350.0096.5%

Rust Native Engine vs Python

MetricPython FallbackRust NativeImprovement
Throughput~10,000 patterns/sec~85,000 patterns/sec8.5x
p95 Latency18ms2ms9x
Memory Usage45 MB12 MB3.75x

Latency Distribution

PercentileLatencyTarget
p5045ms
p95142ms≤ 200ms
p99187ms

Evaluation Design

  • Hallucination benchmark: 12 test cases across 5 categories (numerical fabrication, return guarantees, unauthorized advice, PII leakage, insider information)
  • RAG accuracy benchmark: 10 test cases covering all 7 financial intent categories
  • Latency benchmark: 50 iterations of Layer 1 + Layer 3 pipeline (excluding LLM generation)
  • Injection defense benchmark: 15 adversarial payloads — direct override, role manipulation, jailbreak, Korean injection, Unicode obfuscation, format injection
  • Uncertainty weight sensitivity: 5 weight configurations compared against detection rate
  • Native engine benchmark: Python fallback vs Rust native throughput, latency, memory comparison
  • Test coverage: 104+ automated tests (42 unit + 50 unit + 12 integration) with 100% pass rate

Business Relevance

TruthAnchor is designed specifically for the Korean financial regulatory environment and has direct implications for financial institutions deploying LLMs:

  • Regulatory compliance — Built for Korean Financial Supervisory Service (FSS) requirements; HMAC-SHA256 tamper-evident audit logs suitable for regulatory examination
  • Risk elimination — Multi-layer defense prevents AI-generated hallucinations from reaching customers, eliminating legal liability from fabricated financial data
  • Cost efficiency — Three-tier caching (~60% hit rate) substantially reduces LLM API costs; Rust engine minimizes infrastructure requirements
  • Operational readiness — Production-grade infrastructure with circuit breakers, 3-level graceful degradation, hot-reloadable compliance rules, and multi-tenancy support
  • Continuous improvement — HITL expert review system generates fine-tuning data, creating a virtuous cycle of model improvement
  • Sector adaptability — YAML-driven compliance rules support banking, insurance, securities, and card sectors with sector-specific customization

Limitations

  • RAG coverage dependency — Hallucination defense fundamentally limited by knowledge base completeness; queries about uncovered topics receive "I don't know" responses
  • Latency trade-offs — Full Multi-LLM consensus adds 2–5 seconds, therefore restricted to high-risk intents rather than universal application
  • Korean-specific design — Intent classifier, PII patterns, compliance rules, and injection patterns are Korean-centric; other languages require localization effort
  • Embedding limitations — BAAI/bge-m3 may not capture subtle semantic differences in specialized financial terminology; domain-specific fine-tuning could improve precision
  • ECE cold start — Calibration error signal requires accumulated data; defaults to neutral 0.5 during initial deployment, reducing scorer effectiveness
  • Adversarial robustness — While triple-layered defense achieves 100% on the test set, sophisticated multi-technique evasion attacks require continuous red-teaming

자료 및 다운로드

경영진 요약준비 중
슬라이드 자료준비 중
데모 영상준비 중

이 연구를 적용하고 싶으신가요?

이 연구가 귀하의 AI 배포 요구를 어떻게 지원할 수 있는지 AEGIS Research 팀에 문의하세요.