[HUGGINGFACE]score: 0.55

LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening

May 18, 2026

LLMEval-Logic is a Chinese logical reasoning benchmark using solver-verified formal annotations and adversarial hardening to resist saturation by frontier models. Addresses known weaknesses in template-generated benchmarks with coarse annotations. Relevant for teams evaluating reasoning models on non-English tasks.

paper

SOURCE

https://huggingface.co/papers/2605.19597

← back to feed