[arXiv]score: 0.09

LLMs Show Persistent Moral Reasoning Gap in Persian Proverb Story Generation

June 12, 2026

A new dataset PAND pairs Persian proverbs with human-written stories to test abstraction-to-narrative generation; evaluation using LLM-as-a-Judge plus structural metrics finds models achieve fluent output but consistently fail to faithfully encode the underlying moral and causal structure — a gap explicit chain-of-thought reasoning partially closes.

HOW THIS AFFECTS YOU

●

researcherThe decompression gap finding quantifies a specific failure mode in semantic grounding that is relevant to controlled generation and reasoning faithfulness research.

read original ↗arxiv.org

← back to feed