[arXiv]score: 0.35

Maven Framework Improves Long-Context Reasoning via Evidence-State Rewards

July 3, 2026

Maven uses a reinforcement learning framework with an editable evidence memory to reward action-level state transitions in GRPO. By crediting add, link, and drop actions based on marginal gain and hindsight, it outperforms outcome-only RL on LongBench v2, LongReason, and RULER benchmarks.

HOW THIS AFFECTS YOU

●

builderThis approach can help you improve model performance on long-context retrieval and synthesis tasks.

●

researcherYou can move beyond outcome-only RL by rewarding intermediate evidence-state transitions.

read original ↗arxiv.org

DAILY DIGEST

catch up on AI in 2 minutes, every morning. free. unsubscribe anytime. privacy

← back to feed