[arXiv]score: 0.09

Psy-CoT Framework Uses RL to Improve Role-Playing Agent Character Fidelity

June 26, 2026

Psy-CoT decomposes pre-response reasoning into three psychology-grounded steps — Interaction Perception, Psychological Empathy, and Logical Construction — replacing behavioral mimicry with structured internal reasoning. Reinforcement learning is added on top to address reward hacking under LLM-based reward models, improving out-of-distribution character generalization.

HOW THIS AFFECTS YOU

●

builderWorth watching if you're building character or persona agents — the reward hacking observation under LLM judges is a practical pitfall to account for in your RLHF pipeline.

●

researcherThe three-step CoT decomposition combined with RL alignment offers a testable framework for studying character fidelity beyond SFT baselines.

read original ↗arxiv.org

← back to feed