RL Training with chrF Reward Teaches LLMs to Use Linguistic Context for Unseen Languages
June 5, 2026
Using chrF as a reward signal, RL fine-tuning teaches LLMs to extract and apply in-context grammar information for translating completely unseen low-resource languages, outperforming both in-context learning and supervised fine-tuning on zero-shot transfer. The approach targets meta-skill acquisition rather than language memorization, improving generalization across novel languages at test time.
HOW THIS AFFECTS YOU
●
researcherDemonstrates that a lightweight surface-level reward is sufficient for RL to elicit generalizable in-context reasoning skills, with implications for applying similar reward shaping to other structured reasoning tasks.