Log-Ratio of RL Policy to Reference Recovers Optimal Step-Level Advantage | HACKOBAR_