Fitted Q-Iteration via Max-Plus-Linear Approximation

Y. Liu; M. A. S. Kolarijani

Fitted Q-Iteration via Max-Plus-Linear Approximation

Y. Liu, M. A. S. Kolarijani

TL;DR

This work advances offline reinforcement learning by introducing max-plus-linear approximators for the Q-function within fitted Q-iteration. It proposes MP-FQI and a variational variant (v-MP-FQI) that leverage the Bellman operator’s compatibility with max-plus algebra to achieve provable linear convergence, with per-iteration complexities that scale favorably either with the number of samples ($\mathcal{O}(np)$) or with the number of test functions ($\mathcal{O}(pq)$). The algorithms exploit MP-regression structure to reduce updates to max-plus matrix-vector operations, and the variational formulation achieves sample-size independence in per-iteration costs. Numerical experiments on a DC motor control problem show improved greedy policies over standard FQI, illustrating practical benefits for offline RL with MP representations. The paper also discusses extensions, including sparse MP solutions and fast transforms, to broaden applicability and efficiency.

Abstract

In this study, we consider the application of max-plus-linear approximators for Q-function in offline reinforcement learning of discounted Markov decision processes. In particular, we incorporate these approximators to propose novel fitted Q-iteration (FQI) algorithms with provable convergence. Exploiting the compatibility of the Bellman operator with max-plus operations, we show that the max-plus-linear regression within each iteration of the proposed FQI algorithm reduces to simple max-plus matrix-vector multiplications. We also consider the variational implementation of the proposed algorithm which leads to a per-iteration complexity that is independent of the number of samples.

Fitted Q-Iteration via Max-Plus-Linear Approximation

TL;DR

) or with the number of test functions (

). The algorithms exploit MP-regression structure to reduce updates to max-plus matrix-vector operations, and the variational formulation achieves sample-size independence in per-iteration costs. Numerical experiments on a DC motor control problem show improved greedy policies over standard FQI, illustrating practical benefits for offline RL with MP representations. The paper also discusses extensions, including sparse MP solutions and fast transforms, to broaden applicability and efficiency.

Abstract

Paper Structure (31 sections, 11 theorems, 68 equations, 1 figure, 2 algorithms)

This paper contains 31 sections, 11 theorems, 68 equations, 1 figure, 2 algorithms.

Introduction
Problem statement and preliminaries
Offline RL
Fitted Q-iteration (FQI)
MP-linear approximation
MP-linear regression
Preliminary lemmas
Max-plus FQI
Algorithm
Analysis
An alternative implementation of MP-FQI
Variational max-plus FQI
Algorithm
Analysis
Numerical experiments
...and 16 more sections

Key Result

Lemma 2.1

Consider the two functions $f,\tilde{f} \in \underline {\mathbb{R}}^{\mathsf{Z}}$ and the scalar $\alpha \in \underline {\mathbb{R}}$. Define $[\max \{ f, \tilde{f} \}](z) = \max\{f(z),\tilde{f}(z)\}$ and $[\alpha + f](z) = \alpha + f(z)$ for all $z\in\mathsf{Z}$. We have

Figures (1)

Figure 1: DC motor stabilization problem. Top-Left: Average reward of 100 instances of the problem with random initial state over $T=100$ time steps. Solid (resp. dashed) lines correspond to quadratic (resp. distance) functions in (v-)MP-FQI and RBF (resp. indicator functions) in FQI for state features. Top-Right: The running time of the algorithms. Solid lines (resp. dashed) correspond to compilation (resp. per-iteration) time. Bottom: Convergence of algorithms. Solid and dashed lines are the same as in the top-left figure.

Theorems & Definitions (13)

Lemma 2.1: MP additivity and homogeneity of $\mathbf{B}_{\mathrm{s}}$
Lemma 2.2: Non-expansiveness of MP-linear operators
Proposition 3.1: MP empirical BE
Proposition 3.3: MP-FQI regression
Theorem 3.4: Convergence of MP-FQI
Theorem 3.5: Complexity of MP-FQI
Remark 3.6: Comparison with standard FQI
Proposition 3.7: MP empirical BE II
Proposition 4.1: MP empirical variational BE
Lemma 4.3
...and 3 more

Fitted Q-Iteration via Max-Plus-Linear Approximation

TL;DR

Abstract

Fitted Q-Iteration via Max-Plus-Linear Approximation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (13)