[X]score: 0.33

Claude Opus 4 and GPT-5 Launch Posts Use Non-Overlapping Benchmark Sets

May 28, 2026

Anthropic's Opus 4 release and OpenAI's GPT-5 release four weeks prior each used largely non-overlapping benchmark suites, making direct comparison difficult. This pattern of selective evaluation reporting is a recurring issue for practitioners trying to assess real-world capability differences.

HOW THIS AFFECTS YOU

●

builderDon't rely on vendor benchmark tables to choose between Opus 4 and GPT-5; run your own task-specific evals on your actual workloads.

●

researcherNon-overlapping evals between major releases are a structural problem for the field — independent third-party benchmarking is increasingly necessary for valid comparisons.

SOURCE

https://x.com/agupta/status/2060044689978335616#m

← back to feed