●researcherThe finding that domain-specific fine-tuning or RAG pipelines underperform general frontier models on clinical queries warrants investigation into what those tools are actually optimizing for.
●founderWorth watching because it undermines the moat of clinical AI incumbents like OpenEvidence and UpToDate, suggesting the vertical specialization premium may not hold.
●healthYou can cite this blinded clinician benchmark as evidence that general frontier models may be sufficient for clinical decision support without specialized vertical tools.