Publications

Research papers, technical reports, benchmark studies, whitepapers, and case studies on safe, secure, and trustworthy AI deployment.

6 publications

AEGIS-BR-2026-001Benchmark Report

Benchmarking Guardrail Effectiveness in High-Risk LLM Use Cases

Comparative Evaluation of Layered Guardrail Strategies across Policy-Sensitive Enterprise Scenarios

AEGIS Research Team, Yatav Inc.

Enterprise deployment of Large Language Models in policy-sensitive domains — healthcare, finance, legal, telecommunications, and defense — introduces risks that extend far beyond conventional content safety. Regulatory mandates (EU AI Act, K-AI Act, NIST AI RMF), sector-specific compliance requirements, and the potential for catastrophic harm in high-stakes decision contexts demand guardrail systems whose effectiveness is empirically validated rather than assumed. We present a comprehensive benchmarking study of layered guardrail strategies using the AEGIS (AI Engine for Guardrail & Inspection System) framework, evaluating 8 commercial LLM models across 7 adversarial attack algorithms (112 total evaluations) in two independent sessions. Our results reveal three critical findings: (1) all tested models — including GPT-5, Claude Opus 4.6, and Gemini 3.1 Pro — are classified as VULNERABLE without external guardrails, with a mean baseline defense rate of only 38.1%; (2) a 3-Tier layered guardrail architecture (rule-based filters at <0.5ms, ML classifiers at <5ms, LLM judges at <200ms) improves defense rates from 38.1% to an estimated 75–85% while maintaining 50,000+ RPS throughput; and (3) domain-specific policy enforcement — including telecom-specific threat taxonomies, military ROE compliance, and financial boundary detection — is essential for high-risk use cases where generic safety mechanisms are insufficient. We propose the Enterprise Guardrail Effectiveness Index (EGEI), a composite metric incorporating attack resilience, regulatory compliance coverage, latency overhead, and domain-specific policy adherence, and use it to evaluate guardrail configurations across 5 enterprise deployment scenarios.

guardrailbenchmarkred teamingenterprise AI
AEGIS-RP-2026-001Research Paper

AEGIS: A Multi-Layered Framework for Automated LLM Safety Diagnosis through Adversarial Red-Teaming and Statistical Risk Analysis

Integrated Offensive Red-Teaming and Defensive Guardrail System with SABER Statistical Risk Prediction

AEGIS Research Team, Yatav Inc.

Existing approaches to LLM safety evaluation treat offensive testing and defensive deployment as independent concerns: red-team researchers measure attack success rates while defense engineers deploy guardrails, with no formal framework connecting attacker effort to defender resilience. We present AEGIS, an integrated system that closes this loop through three tightly coupled subsystems: (1) an offensive red-team engine comprising 8 attack algorithms, a Meta-Attack genetic recombinator over 30 atomic primitives, and a reinforcement learning attack agent with PPO-trained policy networks; (2) a defensive guardrail pipeline combining the PALADIN 6-layer deep inspection network with a 3-Tier hierarchical defense (rule-based at <0.5ms, ML classifier at <5ms, LLM Judge at <200ms) and 4 specialized detection algorithms (GuardNet, JBShield, CCFC, MULI); and (3) SABER (Statistical Adversarial risk with Beta Extrapolation and Regression), a statistical risk framework that models per-query vulnerability as θ ~ Beta(α, β), derives the ASR@N scaling law to predict attack success rates under Best-of-N scenarios, and introduces the Budget@τ metric to quantify defender resilience as the minimum attack budget required to achieve success probability τ. SABER further implements a closed-loop deterministic defense promotion system that automatically converts high-risk queries (ASR@1000 ≥ 0.8) into θ=0 deterministic blocks. Empirical evaluation across 8 LLM models (GPT-5, Claude Opus 4.6, Gemini 3.1 Pro, Grok 4.1/3, DeepSeek) in 112 evaluations reveals a baseline defense rate of only 38.1%, with PAIR and Crescendo achieving 100% ASR universally. SABER analysis classifies 6 of 8 models at Critical risk (ASR@1000 ≥ 0.8), while the integrated AEGIS defense improves effective defense rates to 75–90% with the deterministic promotion mechanism providing an additional 5–12% improvement on recurring attack patterns.

red teamingLLM safetySABERPALADIN
AEGIS-WP-2026-001Whitepaper

Building Safe and Compliant Enterprise LLM Deployments

A Whitepaper on Governance, Verification, Security, and Operational Readiness for Applied AI

AEGIS Research Team, Yatav Inc.

Deploying large language models (LLMs) in enterprise environments demands more than model accuracy — it requires a comprehensive framework addressing governance, security, regulatory compliance, and operational resilience. This whitepaper presents the AEGIS (AI Engine for Guardrail & Inspection System) platform's production-grade infrastructure for safe enterprise LLM deployments. Drawing from an implemented codebase comprising 1,591+ passing tests, 32 database migration schemas, 14 agent security modules, and compliance mappings across three regulatory frameworks (EU AI Act, K-AI Act, NIST AI RMF), we detail the architectural patterns, middleware stacks, and operational practices required for responsible AI deployment. The platform supports multi-tenant SaaS with four subscription tiers, Redis-backed rate limiting, comprehensive audit logging with 23 event types, and GS quality certification (ISO/IEC 25051). We demonstrate how defense-in-depth security — spanning JWT/API-key authentication, RBAC+ABAC authorization, PII pseudonymization, and federated learning — can be integrated into a cohesive deployment architecture that satisfies both technical and regulatory requirements.

governancecomplianceenterprise AIsecurity hardening
AEGIS-TR-2026-002Technical Report

AEGINEL Guard: Multilingual AI Prompt Security Classifier for Browser Extensions

Development of a lightweight on-device multi-label threat classifier for real-time prompt safety

Kwangil Kim, AEGIS Research Team

This research presents the full development pipeline of AEGINEL Guard, a multilingual AI prompt security classifier designed to run entirely on-device within Chrome browser extensions. The classifier detects six threat categories — Jailbreak, Prompt Injection, Harmful Content, Script Evasion, Social Engineering, and Encoding Bypass — using a multi-label classification approach. Trained on 188,109 samples across 8 languages, the final DistilBERT-based model achieves 100% binary detection accuracy at 7.6 ms/sample inference speed in a 129.5 MB INT8 ONNX package.

prompt-securityguardrailmultilingualon-device
AEGIS-TR-2026-003Technical Report

TruthAnchor: A Multi-Layer Defense Framework for Hallucination Mitigation in Financial Domain LLMs

Production-ready four-layer pipeline achieving >=97% hallucination detection with <=200ms latency for Korean financial services

Kwangil Kim, AEGIS Research Team

We present TruthAnchor, a multi-layer defense framework designed to mitigate LLM hallucinations in Korean financial services. The architecture implements a four-layer pipeline: (1) Input Governance with triple-layered prompt injection defense, (2) Evidence-Grounded Generation via Hybrid RAG with cross-encoder reranking, (3) Output Verification through a novel 4-signal composite uncertainty scorer and Multi-LLM consensus validation, and (4) Escalation and Audit with human-in-the-loop expert review. Evaluation demonstrates hallucination detection rate >=97%, financial RAG accuracy >=98%, p95 response latency <=200ms, and 100% prompt injection defense rate. The system includes a Rust-accelerated guardrail engine achieving 8.5x throughput improvement over Python baselines.

hallucinationfinancial-AIRAGguardrail
AEGIS-TR-2026-001Technical Report

Reducing Hallucinations in Enterprise AI Systems

A Multi-Layered Defense Architecture for Mission-Critical Domains

TruthAnchor Research Group, AEGIS Research Team

This paper presents TruthAnchor, a comprehensive multi-layered defense architecture designed to reduce hallucination rates in enterprise AI deployments to below 3%. The system implements a four-layer pipeline comprising input governance, evidence-grounded generation, output verification, and human-in-the-loop escalation. Key innovations include a triple-layered prompt injection defense achieving 100% detection, a four-signal composite uncertainty scorer, and a multi-LLM consensus validation mechanism. Evaluated in a Korean financial services context with 104 test cases, the system achieves 100% hallucination detection rate, 100% RAG accuracy, and p95 latency of 142ms.

hallucinationTruthAnchorRAGuncertainty quantification