●builderYou can cut compute costs significantly when training web agents with RL — the open pipeline is faster than the previous best synchronous baseline.
●researcherThe GRPO normalizer fix is a concrete algorithmic contribution worth examining if you train multi-step RL agents on long-horizon tasks.