Table of Contents
Fetching ...

Dynamic Sparsity: Challenging Common Sparsity Assumptions for Learning World Models in Robotic Reinforcement Learning Benchmarks

Muthukumar Pandaram, Jakob Hollenstein, David Drexel, Samuele Tosatto, Antonio Rodríguez-Sánchez, Justus Piater

TL;DR

This work critically evaluates the widely used sparsity priors in learning world-model dynamics for robotic reinforcement learning by analyzing ground-truth transitions in MuJoCo Playground. It leverages Jacobians $J_s$ and $J_a$ to characterize causal structure, state-dependence, and temporal sparsity, revealing that global sparsity is rare while local, state-dependent sparsity emerges in temporally localized blocks (e.g., during contacts). Empirical results show sparsity is not easily captured by naive MLPs, even with Jacobian-guided losses, underscoring the need for models that adapt their causal structure to the current state and time. The findings argue for grounded inductive biases that reflect real-world sparsity patterns to improve generalization and planning in model-based reinforcement learning. Overall, the work highlights the nuanced role of sparsity in world models and points toward architectures capable of dynamic, context-sensitive sparsity control.

Abstract

The use of learned dynamics models, also known as world models, can improve the sample efficiency of reinforcement learning. Recent work suggests that the underlying causal graphs of such dynamics models are sparsely connected, with each of the future state variables depending only on a small subset of the current state variables, and that learning may therefore benefit from sparsity priors. Similarly, temporal sparsity, i.e. sparsely and abruptly changing local dynamics, has also been proposed as a useful inductive bias. In this work, we critically examine these assumptions by analyzing ground-truth dynamics from a set of robotic reinforcement learning environments in the MuJoCo Playground benchmark suite, aiming to determine whether the proposed notions of state and temporal sparsity actually tend to hold in typical reinforcement learning tasks. We study (i) whether the causal graphs of environment dynamics are sparse, (ii) whether such sparsity is state-dependent, and (iii) whether local system dynamics change sparsely. Our results indicate that global sparsity is rare, but instead the tasks show local, state-dependent sparsity in their dynamics and this sparsity exhibits distinct structures, appearing in temporally localized clusters (e.g., during contact events) and affecting specific subsets of state dimensions. These findings challenge common sparsity prior assumptions in dynamics learning, emphasizing the need for grounded inductive biases that reflect the state-dependent sparsity structure of real-world dynamics.

Dynamic Sparsity: Challenging Common Sparsity Assumptions for Learning World Models in Robotic Reinforcement Learning Benchmarks

TL;DR

This work critically evaluates the widely used sparsity priors in learning world-model dynamics for robotic reinforcement learning by analyzing ground-truth transitions in MuJoCo Playground. It leverages Jacobians and to characterize causal structure, state-dependence, and temporal sparsity, revealing that global sparsity is rare while local, state-dependent sparsity emerges in temporally localized blocks (e.g., during contacts). Empirical results show sparsity is not easily captured by naive MLPs, even with Jacobian-guided losses, underscoring the need for models that adapt their causal structure to the current state and time. The findings argue for grounded inductive biases that reflect real-world sparsity patterns to improve generalization and planning in model-based reinforcement learning. Overall, the work highlights the nuanced role of sparsity in world models and points toward architectures capable of dynamic, context-sensitive sparsity control.

Abstract

The use of learned dynamics models, also known as world models, can improve the sample efficiency of reinforcement learning. Recent work suggests that the underlying causal graphs of such dynamics models are sparsely connected, with each of the future state variables depending only on a small subset of the current state variables, and that learning may therefore benefit from sparsity priors. Similarly, temporal sparsity, i.e. sparsely and abruptly changing local dynamics, has also been proposed as a useful inductive bias. In this work, we critically examine these assumptions by analyzing ground-truth dynamics from a set of robotic reinforcement learning environments in the MuJoCo Playground benchmark suite, aiming to determine whether the proposed notions of state and temporal sparsity actually tend to hold in typical reinforcement learning tasks. We study (i) whether the causal graphs of environment dynamics are sparse, (ii) whether such sparsity is state-dependent, and (iii) whether local system dynamics change sparsely. Our results indicate that global sparsity is rare, but instead the tasks show local, state-dependent sparsity in their dynamics and this sparsity exhibits distinct structures, appearing in temporally localized clusters (e.g., during contact events) and affecting specific subsets of state dimensions. These findings challenge common sparsity prior assumptions in dynamics learning, emphasizing the need for grounded inductive biases that reflect the state-dependent sparsity structure of real-world dynamics.

Paper Structure

This paper contains 31 sections, 13 equations, 13 figures, 13 tables.

Figures (13)

  • Figure 1: Benchmark Environments from the DeepMind Control Suite tassaDeepMindControlSuite2018, implemented in MuJoCo Playground zakkaMuJoCoPlayground2025: (top) BallInCup, CartpoleBalance, CheetahRun, ReacherHard, (bottom) FingerSpin, FingerTurnEasy, WalkerRun, SwimmerSwimmer6
  • Figure 2: The simulator's ground-truth-state $s_t$ is advanced by the simulators $\operatorname{step}(s_t, a_t)$ function to produce the next state $s_{t+1}$. Each of these states $s_t$ non-invertibly produces a corresponding observation $o_t$, that is used by the trained agent to choose an action. In this work we analyze the dynamics of the ground truth next state $s_{t+1}$ with respect to the current state $s_t$.
  • Figure 3: For the two environments ReacherHard and FingerTurnEasy, the heatmap illustrates the proportion of time each element of the Jacobians $J_s = \frac{\delta}{\delta s}\operatorname{step}(s, a)$ and $J_a = \frac{\delta}{\delta a}\operatorname{step}(s, a)$ remains zero (indicating the independence of the variables) during an episode rollout, expressed as a percentage of the total episode duration averaged across rollouts and seeds. The heatmap values are rounded to the nearest integer. Most Jacobian elements remain nonzero throughout the episode, a small number stay zero for the entire duration and the remaining elements are zero for only a fraction of the timesteps. Similar heatmaps for the remaining environments considered for analysis are shown in \ref{['sec:a:sparsity_heatmaps']}.
  • Figure 4: A 2-dimensional t-SNE embedding of state, action and next state tuples with a perplexity value of 50 colored by the combined sparsity values of state and action Jacobians across 10 episodes of FingerTurnEasy. Sparsity in the Jacobians is often related to contacts: When the Finger is not moving the object, we observe higher sparsity compared to when the Finger pushes the object in the process. The sparsity values are given near the images of the states.
  • Figure 5: Histograms showing the distribution of state and action Jacobians sparsity values in the whole dataset with a bin width of 0.1. The sparsity values are mostly concentrated in a small number of bins, indicating a repetition of similar sparsity patterns in the Jacobians over the trajectories.
  • ...and 8 more figures