HACKOBAR_item
[arXiv]score: 0.42

ReVision: Scaling Computer-Use Agents via Temporal Visual Redundancy Reduction

May 13, 2026
ReVision trains multimodal LMs to exploit temporal visual redundancy across GUI screenshots in computer-use agent trajectories, reducing token costs that otherwise scale linearly with interaction length. Prior CUAs showed negligible performance gains from history due to context budget constraints; ReVision directly addresses this bottleneck. Key advance for long-horizon GUI automation and computer-use agent scalability.
cs.CL