Maven Framework Improves Long-Context Reasoning via Evidence-State Rewards
July 3, 2026
Maven uses a reinforcement learning framework with an editable evidence memory to reward action-level state transitions in GRPO. By crediting add, link, and drop actions based on marginal gain and hindsight, it outperforms outcome-only RL on LongBench v2, LongReason, and RULER benchmarks.
HOW THIS AFFECTS YOU
●
builderThis approach can help you improve model performance on long-context retrieval and synthesis tasks.
●
researcherYou can move beyond outcome-only RL by rewarding intermediate evidence-state transitions.