[X]score: 0.41

GLM-5.2 Full RL Post-Training Ran in 2 Days on Open-Source Slime

June 18, 2026

THUDM's slime platform, used for GLM-5.2 reinforcement learning post-training, is fully open-source on GitHub. The complete OPD post-training run completed in approximately 2 days, making production-grade RL infrastructure accessible without proprietary tooling.

HOW THIS AFFECTS YOU

●

builderYou can use slime (github.com/THUDM/slime) as a ready-made RL post-training stack, with a real production run validating it completes in ~2 days.

●

researcherWorth examining as a reproducible RL post-training setup that achieved full GLM-5.2 OPD training in 2 days — concrete efficiency baseline for comparing RL infrastructure choices.

read original ↗x.com

← back to feed