●builderWorth watching because this directly targets production long-context inference bottlenecks — PCIe bandwidth and sparse selection overhead — that affect real serving costs.
●researcherThe decoupled Forecast head is a novel architectural primitive worth examining for long-context efficiency research beyond standard sparse attention approaches.