Train-Then-Fix RL Deployment Is Structurally Wrong, Paper Argues | HACKOBAR_