DPO Fine-Tuning Study Reports Efficiency Gains but Training Instability | HACKOBAR_