[HUGGINGFACE]score: 0.48

Claw-SWE-Bench: 350-Instance Multilingual Coding Benchmark for General Agents

June 9, 2026

Claw-SWE-Bench standardizes evaluation of general-purpose agents on GitHub issue resolution across 8 languages and 43 repositories with a fixed prompt, runtime budget, and patch extraction protocol. It solves the harness incompatibility problem that makes comparing heterogeneous agents on SWE-bench unreliable.

HOW THIS AFFECTS YOU

●

builderYou can now benchmark general-purpose agents like OpenClaw-style systems against a consistent coding eval without custom SWE-bench scaffolding.

●

researcherThe adapter protocol and standardized harness contract enable fair apples-to-apples comparison of agent architectures on multilingual coding tasks.

read original ↗huggingface.co

← back to feed