[arXiv]score: 0.12

RL Training with chrF Reward Teaches LLMs to Use Linguistic Context for Unseen Languages

June 5, 2026

Using chrF as a reward signal, RL fine-tuning teaches LLMs to extract and apply in-context grammar information for translating completely unseen low-resource languages, outperforming both in-context learning and supervised fine-tuning on zero-shot transfer. The approach targets meta-skill acquisition rather than language memorization, improving generalization across novel languages at test time.

HOW THIS AFFECTS YOU

●

researcherDemonstrates that a lightweight surface-level reward is sufficient for RL to elicit generalizable in-context reasoning skills, with implications for applying similar reward shaping to other structured reasoning tasks.

read original ↗arxiv.org

← back to feed