HDPO Trains LLMs to Generate and Select Among Diverse Solution Paths | HACKOBAR_