[arXiv]score: 0.24
Regularized Centered Emphatic Temporal Difference Learning
May 7, 2026
Researchers from arXiv introduce Regularized Emphatic Temporal-Difference Learning (RETD), fixing a critical flaw in centered Emphatic TD where naive Bellman-error centering destroys positive-definiteness of the ETD key matrix. RETD preserves the follow-on trace while regularizing only the auxiliary centering recursion, lifting the lower-right block from 1 to 1+c. Convergence is proven under sufficient regularization conditions. RL practitioners using off-policy TD with function approximation gain a theoretically grounded variance-reduction method without sacrificing stability guarantees that vanilla ETD or centered TD cannot simultaneously provide.
cs.AI