ResearchPilot
A scholarly research copilot that heals its own bad results
Evidence-based research for healthcare, science, and technology questions. Searches six scholarly databases at once (PubMed, arXiv, and more), scores the quality of its own answers, retries until they meet the bar, and cites exactly where every claim came from.
15
MCP tools
13
Scholarly + technical sources
0 bytes
Data persisted
≥ 0.80
Faithfulness target (summaries)
The problem
Scholarly APIs are fragmented — different protocols, formats, and quality levels — and AI research assistants return confident answers with no signal about how trustworthy the underlying retrieval actually was.
What we built
Federated six primary scholarly APIs (plus USPTO, NIST, IETF, openFDA and more) behind one MCP server with parallel fan-out, fuzzy DOI deduplication, and cross-encoder reranking.
Built an autonomous self-improvement loop: every tool checks its own quality metrics (source coverage, relevance scores, OCR confidence, faithfulness) and retries with escalating strategies — cheap source retries first, query broadening next, LLM reformulation last.
Attached a trust envelope to every response: which sources were called, latencies, quality metrics, warnings, and a full improvement log — the client sees exactly how confident to be.
Kept it radically stateless: in-memory cache only, zero persistence, so it's safe for sensitive research workflows by construction.
Architecture
Outcomes
- ▸Deployed as a containerized remote MCP server on Azure with bearer-token auth and per-IP rate limiting
- ▸Works today inside Claude and ChatGPT — literature reviews, citation chasing, patent landscaping with cited evidence
- ▸RAGAS-style inline evaluation (faithfulness, answer relevance, citation accuracy) on every summarization call
- ▸OCR fallback rescues scanned PDFs that defeat naive text extraction — quality checked per page
Stack