[HUGGINGFACE]score: 0.47
VitaBench 2.0 Benchmarks LLM Agents on Long-Term Personalization and Proactive Behavior
May 25, 2026
VitaBench 2.0 evaluates agents on temporally ordered, per-user task sequences requiring inference of unstated preferences and proactive interaction, filling a gap left by benchmarks focused solely on reasoning and tool use.
paper
HOW THIS AFFECTS YOU
●
builderYou can use VitaBench 2.0 to measure whether your agent actually learns and acts on user preferences over time, not just within a single session.
●
researcherThe temporally ordered per-user task structure provides a more realistic evaluation axis for personalization and proactivity than existing static agent benchmarks.