ChLogic Benchmark Tests Logical Reasoning Consistency Across English and Chinese
June 15, 2026
ChLogic pairs English logical reasoning problems with five Chinese surface realizations each, built from 60 general and 40 difficult formal logical templates across nine template families, plus 15 Chinese-specific phenomenon types. It directly tests whether LLM logical reasoning is robust to language surface form rather than latent structure.
HOW THIS AFFECTS YOU
●
researcherChLogic provides a controlled evaluation for separating language-surface sensitivity from logical reasoning ability in multilingual LLMs, with aligned English-Chinese pairs enabling direct cross-lingual comparison.