●builderThis provides a method to increase token throughput in speculative decoding pipelines without changing the underlying architecture.
●researcherYou can improve speculative decoding efficiency by aligning training objectives with the actual acceptance-until-fail inference behavior.