[arXiv]score: 0.24

Regularized Centered Emphatic Temporal Difference Learning

May 7, 2026

Researchers from arXiv introduce Regularized Emphatic Temporal-Difference Learning (RETD), fixing a critical flaw in centered Emphatic TD where naive Bellman-error centering destroys positive-definiteness of the ETD key matrix. RETD preserves the follow-on trace while regularizing only the auxiliary centering recursion, lifting the lower-right block from 1 to 1+c. Convergence is proven under sufficient regularization conditions. RL practitioners using off-policy TD with function approximation gain a theoretically grounded variance-reduction method without sacrificing stability guarantees that vanilla ETD or centered TD cannot simultaneously provide.

cs.AI

SOURCE

https://arxiv.org/abs/2605.04100

← back to feed