Table of Contents
Fetching ...

Locality Sensitive Sparse Encoding for Learning World Models Online

Zichen Liu, Chao Du, Wee Sun Lee, Min Lin

TL;DR

The paper tackles online world model learning for model-based RL under nonstationarity that causes catastrophic forgetting in neural models. It introduces Losse-FTL, a linear regressor on nonlinear random features with a locality-sensitive sparse encoding that enables no-regret online updates, with a per-step complexity that remains manageable due to sparsity; the method has a regret bound $\text{Regret}_T(\mathbf{W}) = O(\log T)$. It validates the approach on supervised online learning benchmarks and RL tasks under the Dyna architecture, showing that Losse-FTL can surpass or match deep replay models while improving computation efficiency. The contribution highlights a practical path toward scalable online world models by combining a high-capacity, sparse feature representation with analytic, closed-form online updates.

Abstract

Acquiring an accurate world model online for model-based reinforcement learning (MBRL) is challenging due to data nonstationarity, which typically causes catastrophic forgetting for neural networks (NNs). From the online learning perspective, a Follow-The-Leader (FTL) world model is desirable, which optimally fits all previous experiences at each round. Unfortunately, NN-based models need re-training on all accumulated data at every interaction step to achieve FTL, which is computationally expensive for lifelong agents. In this paper, we revisit models that can achieve FTL with incremental updates. Specifically, our world model is a linear regression model supported by nonlinear random features. The linear part ensures efficient FTL update while the nonlinear random feature empowers the fitting of complex environments. To best trade off model capacity and computation efficiency, we introduce a locality sensitive sparse encoding, which allows us to conduct efficient sparse updates even with very high dimensional nonlinear features. We validate the representation power of our encoding and verify that it allows efficient online learning under data covariate shift. We also show, in the Dyna MBRL setting, that our world models learned online using a single pass of trajectory data either surpass or match the performance of deep world models trained with replay and other continual learning methods.

Locality Sensitive Sparse Encoding for Learning World Models Online

TL;DR

The paper tackles online world model learning for model-based RL under nonstationarity that causes catastrophic forgetting in neural models. It introduces Losse-FTL, a linear regressor on nonlinear random features with a locality-sensitive sparse encoding that enables no-regret online updates, with a per-step complexity that remains manageable due to sparsity; the method has a regret bound . It validates the approach on supervised online learning benchmarks and RL tasks under the Dyna architecture, showing that Losse-FTL can surpass or match deep replay models while improving computation efficiency. The contribution highlights a practical path toward scalable online world models by combining a high-capacity, sparse feature representation with analytic, closed-form online updates.

Abstract

Acquiring an accurate world model online for model-based reinforcement learning (MBRL) is challenging due to data nonstationarity, which typically causes catastrophic forgetting for neural networks (NNs). From the online learning perspective, a Follow-The-Leader (FTL) world model is desirable, which optimally fits all previous experiences at each round. Unfortunately, NN-based models need re-training on all accumulated data at every interaction step to achieve FTL, which is computationally expensive for lifelong agents. In this paper, we revisit models that can achieve FTL with incremental updates. Specifically, our world model is a linear regression model supported by nonlinear random features. The linear part ensures efficient FTL update while the nonlinear random feature empowers the fitting of complex environments. To best trade off model capacity and computation efficiency, we introduce a locality sensitive sparse encoding, which allows us to conduct efficient sparse updates even with very high dimensional nonlinear features. We validate the representation power of our encoding and verify that it allows efficient online learning under data covariate shift. We also show, in the Dyna MBRL setting, that our world models learned online using a single pass of trajectory data either surpass or match the performance of deep world models trained with replay and other continual learning methods.
Paper Structure (23 sections, 12 equations, 10 figures, 1 table, 3 algorithms)

This paper contains 23 sections, 12 equations, 10 figures, 1 table, 3 algorithms.

Figures (10)

  • Figure 1: This Gridworld environment requires the agent to navigate from the start position ("S") to the goal location ("G") with shortest path. The tabular Q-learning agent starts from a random policy $\pi_0$ and improves to get better policies $\pi_t \cdots \pi_{t^\prime}$, leading to narrower state visitation towards the optimal trajectories (the yellow regions). Due to such distributional shift, the NN-based model (top) may forget recently under-visited regions, even though it has explored there before. The red circles indicate erroneous predictions where the Euclidean distance between the ground truth next state and the prediction is greater than a threshold ($\delta=0.05$). In contrast, our method (bottom) learns online, and at each step incrementally computes the optimal solution over all accumulated data, thus is resilient to forgetting.
  • Figure 2: The Dyna architecture.
  • Figure 3: Locality sensitive sparse encoding. $\sigma(\cdot)$ projects input vectors into a random feature space, and $b(\cdot)$ softly bins $\sigma({\mathbf{x}}_t)$ into multiple $\rho$-dimensional grids, which are flattened and stacked into a high-dimensional sparse encoding $\phi({\mathbf{x}}_t)$.
  • Figure 4: Mean squared errors on the stream learning task of different correlation levels. Solid lines and shaded areas correspond to the means and stand errors of $30$ runs.
  • Figure 5: Learning curves showing normalized episode return. We compare our method with five baselines on (top) discrete control and (bottom) continuous control benchmarks. Solid curves depict the means of multiple runs with different random seeds, while shaded areas represent standard errors.
  • ...and 5 more figures

Theorems & Definitions (1)

  • Remark 3.1: Sparsity guarantee of Losse