[HUGGINGFACE]score: 0.36

RNG-Bench Tests Multimodal LLMs on Hidden-State Reconstruction During Interaction

June 16, 2026

RNG-Bench isolates a model's ability to reconstruct no-longer-visible past observations and act on them in two games: Matching Pairs (card location recall) and 3D Maze (egocentric spatial integration). It addresses a gap in existing benchmarks that either expose full state or conflate hidden-state recall with other agent skills.

HOW THIS AFFECTS YOU

●

researcherRNG-Bench provides a clean isolation of non-Markov memory capabilities in multimodal models, separating this from reasoning or planning confounds present in prior benchmarks.

read original ↗huggingface.co

← back to feed