2026

Dr. Holmes: Multi-Agent Diagnostic Deliberation

105 tests passing · phases 1–7 complete

Overview

Dr. Holmes is a research prototype that simulates a House MD–style diagnostic team using seven personality-distinct LLM agents (each a different specialty, several across model providers). A deterministic moderator (Dr. Caddick) routes deliberation; reasoning is grounded in a Neo4j medical knowledge graph, a Bayesian engine trained on DDXPlus, and case literature retrieved from ChromaDB. The system supports doctor-in-the-loop interrupts, a reversible 'concluded' lifecycle, and a strict live-mode budget guard. Not for clinical use.

Highlights

7-agent team (Hauser, Forman, Carmen, Chen, Wills, Park, Caddick) across xAI Grok and OpenAI GPT-4o, with deterministic moderator routing
Medical Intelligence layer: Neo4j knowledge graph + Bayesian engine over DDXPlus (49 dx, 882 likelihoods) + ChromaDB case retrieval
FastAPI + WebSocket backend with Redis Streams, Postgres/SQLite persistence, Prometheus metrics, full audit log
Eval harness with DDXPlus stratified sampling, calibration analysis (ECE, Brier, reliability bins), bootstrap CIs, and a deterministic LLM cache
Human-in-the-loop interrupts via LangGraph checkpointer: pause/resume/inject evidence/correct/conclude
105 tests passing across 8 phases — orchestration, API, eval, HITL, budget, and frontend

View on GitHub

Stack

PythonLangGraphFastAPIWebSocketNeo4jChromaDBNext.js