[X]score: 0.21

We recently tested all of the major LLMs with tic-tac-toe, modified chess, and a novel game -- even the top models all failed: illegal moves, claim…

May 23, 2026

Testing of major LLMs on tic-tac-toe, modified chess, and a novel game found consistent failures across top models, including illegal moves and false win claims. No methodology, model list, or quantitative results are provided — just a LinkedIn teaser. Relevant to those evaluating LLM reasoning and game-state tracking, but lacks actionable technical detail.

SOURCE

https://x.com/peterevoss/status/2058356875050009076#m

← back to feed