Fitted Q-Iteration via Max-Plus-Linear Approximation
Y. Liu, M. A. S. Kolarijani
TL;DR
This work advances offline reinforcement learning by introducing max-plus-linear approximators for the Q-function within fitted Q-iteration. It proposes MP-FQI and a variational variant (v-MP-FQI) that leverage the Bellman operator’s compatibility with max-plus algebra to achieve provable linear convergence, with per-iteration complexities that scale favorably either with the number of samples ($\mathcal{O}(np)$) or with the number of test functions ($\mathcal{O}(pq)$). The algorithms exploit MP-regression structure to reduce updates to max-plus matrix-vector operations, and the variational formulation achieves sample-size independence in per-iteration costs. Numerical experiments on a DC motor control problem show improved greedy policies over standard FQI, illustrating practical benefits for offline RL with MP representations. The paper also discusses extensions, including sparse MP solutions and fast transforms, to broaden applicability and efficiency.
Abstract
In this study, we consider the application of max-plus-linear approximators for Q-function in offline reinforcement learning of discounted Markov decision processes. In particular, we incorporate these approximators to propose novel fitted Q-iteration (FQI) algorithms with provable convergence. Exploiting the compatibility of the Bellman operator with max-plus operations, we show that the max-plus-linear regression within each iteration of the proposed FQI algorithm reduces to simple max-plus matrix-vector multiplications. We also consider the variational implementation of the proposed algorithm which leads to a per-iteration complexity that is independent of the number of samples.
