Smarter Pair Selection in DPO Training Outperforms Random Comparison Sampling | HACKOBAR_