LLMs Show Persistent Moral Reasoning Gap in Persian Proverb Story Generation
June 12, 2026
A new dataset PAND pairs Persian proverbs with human-written stories to test abstraction-to-narrative generation; evaluation using LLM-as-a-Judge plus structural metrics finds models achieve fluent output but consistently fail to faithfully encode the underlying moral and causal structure — a gap explicit chain-of-thought reasoning partially closes.
HOW THIS AFFECTS YOU
●
researcherThe decompression gap finding quantifies a specific failure mode in semantic grounding that is relevant to controlled generation and reasoning faithfulness research.