WorldLines Benchmark Tests Embodied Agents on Long-Horizon Household Memory
June 16, 2026
WorldLines is a benchmark for household embodied agents requiring long-term memory across dialogues, object state changes, and multi-step task planning. It pairs Memory QA and Embodied Task Planning tasks with an ObsMem framework that tracks visibility-aware memory states.
HOW THIS AFFECTS YOU
●
researcherFills a gap between language-only memory benchmarks and short-horizon embodied benchmarks — useful for evaluating agents that must track dynamic world state over extended interactions.