Online reinforcement learning via sparse Gaussian mixture model Q-functions
Minh Vu, Konstantinos Slavakis
TL;DR
The paper tackles online reinforcement learning by introducing sparse Gaussian mixture Q-functions (S-GMM-QFs) that are learned online with an interpretable sparsity mechanism via Hadamard overparametrization. The parameter space forms a Riemannian manifold, enabling smooth online optimization of a BR-style objective using streaming data and an experience buffer, with updates performed by Riemannian Adam. The approach yields compact, expressive Q-functions that can match or exceed dense DeepRL methods on standard benchmarks while using substantially fewer parameters, and remains robust in low-parameter regimes where sparse DeepRL methods falter. This work highlights the value of structured sparsity and manifold-based optimization for efficient, interpretable online RL.
Abstract
This paper introduces a structured and interpretable online policy-iteration framework for reinforcement learning (RL), built around the novel class of sparse Gaussian mixture model Q-functions (S-GMM-QFs). Extending earlier work that trained GMM-QFs offline, the proposed framework develops an online scheme that leverages streaming data to encourage exploration. Model complexity is regulated through sparsification by Hadamard overparametrization, which mitigates overfitting while preserving expressiveness. The parameter space of S-GMM-QFs is naturally endowed with a Riemannian manifold structure, allowing for principled parameter updates via online gradient descent on a smooth objective. Numerical experiments show that S-GMM-QFs match or even outperform dense deep RL (DeepRL) methods on standard benchmarks while using significantly fewer parameters. Moreover, they maintain strong performance even in low-parameter regimes where sparsified DeepRL methods fail to generalize.
