[arXiv]score: 0.13

WorldLines Benchmark Tests Embodied Agents on Long-Horizon Household Memory

June 18, 2026

WorldLines constructs temporally extended household traces with object state changes, dialogues, and execution feedback to evaluate long-horizon embodied agents on Memory QA and Task Planning. The companion ObsMem framework maintains visibility-aware memories and action-native state trails to address partial observability failures.

HOW THIS AFFECTS YOU

●

researcherWorldLines fills a gap between language-centric memory benchmarks and short-horizon embodied tasks, with dynamic world state tracking as the core evaluation challenge.

read original ↗arxiv.org

← back to feed