[arXiv]score: 0.19

Respiratory AI Models Fail Clinical Thresholds on Body-Worn Sensors

June 25, 2026

BCoughBench tests five respiratory acoustic foundation models (OPERA-CT/CE/GT, HeAR, M2D+Resp) under five simulated body-coupled wearable sensor conditions across nine classification tasks. Mean AUROC drops from 0.785 on smartphone to 0.689–0.723 on wearables, and no model meets the Se@Sp95 ≥ 0.20 clinical sensitivity threshold on most disease tasks under any BC sensor condition.

HOW THIS AFFECTS YOU

●

researcherBenchmark exposes a critical evaluation gap — models trained and tested on smartphone audio do not transfer to wearable sensor modalities, requiring domain-specific fine-tuning or new pretraining data.

●

healthAny clinical deployment of cough-based diagnostics on wearables should not assume smartphone-validated model performance holds — none of the five tested FMs meet minimum clinical sensitivity thresholds.

read original ↗arxiv.org

← back to feed