[r/artificial]score: 0.14

GPT-5.5: 'strongest agentic coding model ever' failing spectacularly at its own game (LiveBench)

April 25, 2026

**GPT-5.5 scores 56.67 on LiveBench's agentic coding benchmark, underperforming its predecessor GPT-5.4 (70.00) and ranking 11th overall, behind Gemini 2.5 Pro, Claude 4.6, and several other models.** This is notable because OpenAI explicitly positioned GPT-5.5 as their "strongest agentic coding model to date" and built a new subscription tier and the Codex product around that specific capability claim. For practitioners evaluating models for agentic coding pipelines, this benchmark data suggests the marketing narrative diverges significantly from independent third-party evaluation results, warranting caution before adopting GPT-5.5 for production agentic workflows.

discussion

SOURCE

https://www.reddit.com/r/artificial/comments/1sv4l94/gpt55_strongest_agentic_coding_model_ever_failing/

← back to feed