[HUGGINGFACE]score: 0.47

SimuWoB: 120-Task Synthetic Mobile GUI Benchmark Covers Real-App Complexity and Long-Horizon Tasks

May 23, 2026

SimuWoB generates 120 high-fidelity synthetic mobile GUI tasks spanning diverse real-world app types and difficulty levels, addressing the gap between reproducible benchmarks limited to open-source apps and actual production app complexity.

paper

HOW THIS AFFECTS YOU

●

builderYou can evaluate mobile GUI agents on complex, long-horizon tasks that better reflect real app interactions without needing live app access.

●

researcherThe synthetic environment generation framework enables scalable, reward-verified evaluation of GUI agents on tasks previously inaccessible due to closed-source app constraints.

SOURCE

https://huggingface.co/papers/2605.25160

← back to feed