[RSS LABS]score: 0.86

Frontier LLMs Score Under 50% on Enterprise IT Agentic Benchmark

May 27, 2026

ITBench-AA, a new benchmark from Artificial Analysis and IBM, shows frontier models scoring below 50% on agentic enterprise IT tasks, establishing a concrete capability ceiling for current models in real-world IT automation.

HOW THIS AFFECTS YOU

●

builderBefore deploying LLM agents for enterprise IT automation, benchmark your stack against ITBench-AA — current frontier models fail more than half the tasks.

●

researcherProvides a new evaluation framework for agentic IT task performance, with sub-50% scores revealing a measurable gap between current frontier models and enterprise-grade reliability.

●

founderThe sub-50% performance ceiling signals enterprise IT automation is still an open problem, representing a real product opportunity rather than a solved space.

SOURCE

https://huggingface.co/blog/ibm-research/itbench-aa

← back to feed