14K-Example Data Recipe Boosts Long-Context RL Without Reward Engineering | HACKOBAR_