ARCHITECTUREAgentsGraphRAGCompliance

Biotech Research Intelligence

Multi-agent GraphRAG with an LLM judge gating every answer

Six AI specialists research biotech questions together, and a referee AI checks every answer before you see it. Built for research teams that cannot afford confident nonsense.

Specialist agents

Judge evaluation axes

≥ 0.85

Groundedness gate

~$500K

Cost of one wrong research path

The problem

Biotech researchers spend weeks triaging PubMed, USPTO, and internal documents with no provenance chain — and a single wrong-path experiment costs roughly half a million dollars to redirect. AI assistance is useless here unless every claim is cited, every citation reachable, and every output reproducible.

What we built

Designed a supervisor-specialist topology: a coordinating agent plans research tasks and routes them to Literature, Patent, Knowledge-Graph, Vision/Figure, Drafting, and Compliance specialists.

Combined three retrieval modalities — BM25, dense vectors (3072-dim), and semantic reranking via RRF — with GraphRAG traversal over a biomedical knowledge graph (papers, patents, genes, proteins, compounds, trials) for multi-hop questions.

Put an independent LLM-as-judge between agents and users: every response scored on groundedness, citation correctness, faithfulness, completeness, and safety, with ACCEPT/REFINE/REJECT/ESCALATE gating and 2-of-3 majority voting for high-stakes outputs.

Engineered for regulated reality: immutable audit chains, e-signatures on drafts, golden-set regression that blocks merges when groundedness drops more than 0.03, and a staged path to 21 CFR Part 11 validation.

Architecture

OrchestrationSupervisor agent + 6 specialists with planner and synthesis stages

RetrievalHybrid BM25 + vector + semantic reranker (RRF fusion), metadata filtering

Knowledge graphBiomedical entities and relations (CITES, ENCODES, TARGETS, INHIBITS) for multi-hop GraphRAG

EvaluationIndependent LLM judge, 5-axis scoring, drift monitoring with SLO alerts

ComplianceImmutable audit logs, e-signatures, 7-year retention, Part 11 validation path

Outcomes

▸Compresses weeks of prior-art triage into hours while strengthening — not weakening — the provenance chain
▸Every claim cited, every citation reachable, every output reproducible: trust enforced by architecture, not policy
▸Golden-set regression wired into CI: quality degradation literally cannot merge
▸Staged rollout design: 10 researchers on public sources first, validation before internal corpus access

Stack

Azure AI FoundryFastAPIAzure AI SearchCosmos DB GremlinGPT-4oClaude (judge)ReactAzure Container AppsMicrosoft Purview

Next case study

ContractsHub

→