●builderYou can use slime (github.com/THUDM/slime) as a ready-made RL post-training stack, with a real production run validating it completes in ~2 days.
●researcherWorth examining as a reproducible RL post-training setup that achieved full GLM-5.2 OPD training in 2 days — concrete efficiency baseline for comparing RL infrastructure choices.