●researcherIdentifies that latent reward-hack policy states and actual exploit actions are decoupled, motivating context-aware rather than purely activation-based monitoring architectures.
●policyConcrete finding that mechanistic monitoring of agentic reward hacking requires environmental context signals, relevant for designing runtime safety monitors in deployed agents.