Locality Sensitive Sparse Encoding for Learning World Models Online
Zichen Liu, Chao Du, Wee Sun Lee, Min Lin
TL;DR
The paper tackles online world model learning for model-based RL under nonstationarity that causes catastrophic forgetting in neural models. It introduces Losse-FTL, a linear regressor on nonlinear random features with a locality-sensitive sparse encoding that enables no-regret online updates, with a per-step complexity that remains manageable due to sparsity; the method has a regret bound $\text{Regret}_T(\mathbf{W}) = O(\log T)$. It validates the approach on supervised online learning benchmarks and RL tasks under the Dyna architecture, showing that Losse-FTL can surpass or match deep replay models while improving computation efficiency. The contribution highlights a practical path toward scalable online world models by combining a high-capacity, sparse feature representation with analytic, closed-form online updates.
Abstract
Acquiring an accurate world model online for model-based reinforcement learning (MBRL) is challenging due to data nonstationarity, which typically causes catastrophic forgetting for neural networks (NNs). From the online learning perspective, a Follow-The-Leader (FTL) world model is desirable, which optimally fits all previous experiences at each round. Unfortunately, NN-based models need re-training on all accumulated data at every interaction step to achieve FTL, which is computationally expensive for lifelong agents. In this paper, we revisit models that can achieve FTL with incremental updates. Specifically, our world model is a linear regression model supported by nonlinear random features. The linear part ensures efficient FTL update while the nonlinear random feature empowers the fitting of complex environments. To best trade off model capacity and computation efficiency, we introduce a locality sensitive sparse encoding, which allows us to conduct efficient sparse updates even with very high dimensional nonlinear features. We validate the representation power of our encoding and verify that it allows efficient online learning under data covariate shift. We also show, in the Dyna MBRL setting, that our world models learned online using a single pass of trajectory data either surpass or match the performance of deep world models trained with replay and other continual learning methods.
