[HUGGINGFACE]score: 0.42
Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts
June 4, 2026
RHO optimizes LLM agent harnesses — tools, skills, and workflows — without labeled validation data by replaying past trajectories and using self-consistency plus pairwise self-preference to select harness updates. The method samples a diverse coreset of challenging tasks, re-solves them in parallel, and ranks candidate updates by agent-judged pairwise comparison across rollouts.