[HUGGINGFACE]score: 0.61

AsyncWebRL Delivers 2.9x Training Throughput for Vision-Language Web Agents

June 3, 2026

AsyncWebRL overlaps rollout, gradient update, and policy refresh asynchronously, plus fixes a per-trajectory normalizer flaw in multi-step GRPO that causes token and trajectory inefficiency. Benchmarked against WebGym, it achieves up to 2.9x end-to-end training throughput for visual web agents.

HOW THIS AFFECTS YOU

●

builderYou can cut compute costs significantly when training web agents with RL — the open pipeline is faster than the previous best synchronous baseline.

●

researcherThe GRPO normalizer fix is a concrete algorithmic contribution worth examining if you train multi-step RL agents on long-horizon tasks.

read original ↗huggingface.co

← back to feed