●builderIf you're running disaggregated prefill/decode serving at scale, this analysis surfaces non-obvious inefficiencies in KV cache and routing behavior under GPU saturation.
●researcherFirst formal game-theoretic treatment of disaggregated inference architecture provides a framework for reasoning about efficiency losses from selfish resource allocation in multi-pool serving systems.