RNG-Bench Tests Multimodal LLMs on Hidden-State Reconstruction During Interaction
June 16, 2026
RNG-Bench isolates a model's ability to reconstruct no-longer-visible past observations and act on them in two games: Matching Pairs (card location recall) and 3D Maze (egocentric spatial integration). It addresses a gap in existing benchmarks that either expose full state or conflate hidden-state recall with other agent skills.
HOW THIS AFFECTS YOU
●
researcherRNG-Bench provides a clean isolation of non-Markov memory capabilities in multimodal models, separating this from reasoning or planning confounds present in prior benchmarks.