●builderIf your product involves translation or localization pipelines, this test case reveals a class of self-referential edits that current models will silently get wrong.
●researcherWorth watching because it isolates a specific failure mode — models cannot reconcile surface-form constraints with semantic accuracy — distinct from standard hallucination or reasoning benchmarks.