Rectifying Regression in Reinforcement Learning
Alex Ayoub, David Szepesvári, Alireza Bakhtiari, Csaba Szepesvári, Dale Schuurmans
TL;DR
This work investigates how the choice of regression objective affects value-based reinforcement learning, arguing that MAE-based losses (notably log-loss and a reparameterized cat-loss) align more closely with optimal decision-making than the traditional squared loss. The authors provide theoretical bounds showing faster, MAE-aligned convergence under certain problem structures, along with explicit negative results that illustrate limitations of MSE-based approaches. They introduce a reparameterized categorical cross-entropy loss that preserves the mean while enabling multi-class classification, and they demonstrate its empirical viability in linear batch RL via the inverted pendulum experiment. The findings suggest that selecting loss functions informed by MAE can improve policy quality and convergence, with potential implications for broader distributional RL methods and practical algorithm design.
Abstract
This paper investigates the impact of the loss function in value-based methods for reinforcement learning through an analysis of underlying prediction objectives. We theoretically show that mean absolute error is a better prediction objective than the traditional mean squared error for controlling the learned policy's suboptimality gap. Furthermore, we present results that different loss functions are better aligned with these different regression objectives: binary and categorical cross-entropy losses with the mean absolute error and squared loss with the mean squared error. We then provide empirical evidence that algorithms minimizing these cross-entropy losses can outperform those based on the squared loss in linear reinforcement learning.
