[X]score: 0.21
We recently tested all of the major LLMs with tic-tac-toe, modified chess, and a novel game -- even the top models all failed: illegal moves, claim…
May 23, 2026
Testing of major LLMs on tic-tac-toe, modified chess, and a novel game found consistent failures across top models, including illegal moves and false win claims. No methodology, model list, or quantitative results are provided — just a LinkedIn teaser. Relevant to those evaluating LLM reasoning and game-state tracking, but lacks actionable technical detail.