DEPLOYEDMCPAgentsEvals

ResearchPilot

A scholarly research copilot that heals its own bad results

Evidence-based research for healthcare, science, and technology questions. Searches six scholarly databases at once (PubMed, arXiv, and more), scores the quality of its own answers, retries until they meet the bar, and cites exactly where every claim came from.

Try the interactive demo

MCP tools

Scholarly + technical sources

0 bytes

Data persisted

≥ 0.80

Faithfulness target (summaries)

The problem

Scholarly APIs are fragmented — different protocols, formats, and quality levels — and AI research assistants return confident answers with no signal about how trustworthy the underlying retrieval actually was.

What we built

Federated six primary scholarly APIs (plus USPTO, NIST, IETF, openFDA and more) behind one MCP server with parallel fan-out, fuzzy DOI deduplication, and cross-encoder reranking.

Built an autonomous self-improvement loop: every tool checks its own quality metrics (source coverage, relevance scores, OCR confidence, faithfulness) and retries with escalating strategies — cheap source retries first, query broadening next, LLM reformulation last.

Attached a trust envelope to every response: which sources were called, latencies, quality metrics, warnings, and a full improvement log — the client sees exactly how confident to be.

Kept it radically stateless: in-memory cache only, zero persistence, so it's safe for sensitive research workflows by construction.

Architecture

Federation6 primary + 7 secondary source APIs, parallel async fan-out, polite-pool compliance

ExtractionPDF text extraction with automatic Tesseract OCR fallback for scanned papers

Semantic searchVoyage-3 embeddings + in-memory cosine ANN within a single paper — no index, no leak

Quality loopPer-tool metric targets with budgeted auto-retry; escalation ladder from retry → broaden → reformulate

Trust envelopeProvenance, metrics, warnings, and improvement log attached to every single response

Outcomes

▸Deployed as a containerized remote MCP server on Azure with bearer-token auth and per-IP rate limiting
▸Works today inside Claude and ChatGPT — literature reviews, citation chasing, patent landscaping with cited evidence
▸RAGAS-style inline evaluation (faithfulness, answer relevance, citation accuracy) on every summarization call
▸OCR fallback rescues scanned PDFs that defeat naive text extraction — quality checked per page

Stack

PythonFastMCPhttpxVoyage AIClaude APIpypdfTesseractAzure Container AppsDocker

Next case study

IPO Market Intelligence

→