All Projects
View on GitHub
2026
Dr. Holmes: Multi-Agent Diagnostic Deliberation
105 tests passing · phases 1–7 complete

Overview
Dr. Holmes is a research prototype that simulates a House MD–style diagnostic team using seven personality-distinct LLM agents (each a different specialty, several across model providers). A deterministic moderator (Dr. Caddick) routes deliberation; reasoning is grounded in a Neo4j medical knowledge graph, a Bayesian engine trained on DDXPlus, and case literature retrieved from ChromaDB. The system supports doctor-in-the-loop interrupts, a reversible 'concluded' lifecycle, and a strict live-mode budget guard. Not for clinical use.
Highlights
- 7-agent team (Hauser, Forman, Carmen, Chen, Wills, Park, Caddick) across xAI Grok and OpenAI GPT-4o, with deterministic moderator routing
- Medical Intelligence layer: Neo4j knowledge graph + Bayesian engine over DDXPlus (49 dx, 882 likelihoods) + ChromaDB case retrieval
- FastAPI + WebSocket backend with Redis Streams, Postgres/SQLite persistence, Prometheus metrics, full audit log
- Eval harness with DDXPlus stratified sampling, calibration analysis (ECE, Brier, reliability bins), bootstrap CIs, and a deterministic LLM cache
- Human-in-the-loop interrupts via LangGraph checkpointer: pause/resume/inject evidence/correct/conclude
- 105 tests passing across 8 phases — orchestration, API, eval, HITL, budget, and frontend
Stack
PythonLangGraphFastAPIWebSocketNeo4jChromaDBNext.js