Batch-1 LLM Decode Underutilizes HBM Bandwidth on H100, A100, L40S, L4 | HACKOBAR_