LaGO Uses LLM Latent Priors to Boost PPO Success Rates Up to 5.6x
June 24, 2026
LaGO frames a pretrained LLM as a soft latent action prior during online RL rather than an explicit controller, improving PPO success rates from 2.7% to 15.2% on Meta-World and 15.1% to 27.2% on CLEVR-Robot. Stronger base LLMs yield better guidance, suggesting the approach scales with model quality.
HOW THIS AFFECTS YOU
●
researcherThe latent guidance framing sidesteps action-space precision requirements that make LLM-as-controller brittle, and the scaling result with stronger LLMs is worth replicating.