GRPO Framework Cuts LLM Confidence-Rationale Misalignment by 26.5% | HACKOBAR_