SJF Scheduling in vLLM Cuts ASR Median Latency 73% at High Load
March 10, 2026
Audio duration is a reliable proxy for Whisper ASR job processing time, enabling duration-aware scheduling via Shortest Job First (SJF) and Highest Response Ratio Next (HRRN) integrated into vLLM. On LibriSpeech test-clean, SJF reduces median end-to-end latency by up to 73% at high load versus FCFS, though it increases 90th-percentile latency under workload drift.
HOW THIS AFFECTS YOU
●
builderIf you're running Whisper at scale on vLLM, switching from FCFS to SJF scheduling can cut median latency by up to 73% — but monitor tail latency under workload drift.