[r/LocalLLaMA]score: 0.21

Claude Opus 4.8 leads SWE-rebench with 56.5% success rate

July 1, 2026

The SWE-rebench leaderboard update shows Claude Opus 4.8 xhigh achieving 56.5% on software engineering tasks, followed by GLM-5.2 at 51.1%. Local models like Qwen3.6-27B show competitive performance for self-hosted coding agents.

HOW THIS AFFECTS YOU

●

builderYou can use these benchmark scores to select models for autonomous coding agents.

●

researcherThese results provide updated performance baselines for software engineering agent evaluations.

read original ↗swe-rebench.com

← back to feed