[arXiv]score: 0.15
Weakly Supervised MIL Framework Localizes Whale Calls with F1 0.88+
May 29, 2026
DSMIL-LocNet uses recording-level presence/absence labels only to perform both classification and temporal localization of whale calls in 2–30 minute recordings, eliminating the need for expensive per-call timestamping. A dual-stream spectral-temporal architecture avoids the temporal compression artifacts that degrade CNN baselines on long inputs, achieving F1 scores of 0.88+ on the AcousticTrends BlueFinLibrary.
cs.SDcs.AIcs.LGeess.AS
HOW THIS AFFECTS YOU
●
researcherThe MIL-based weak supervision approach for joint classification and localization could transfer to other long-duration audio domains where precise temporal annotation is the bottleneck.