Table of Contents
Fetching ...

Liquid Networks with Mixture Density Heads for Efficient Imitation Learning

Nikolaus Correll

Abstract

We compare liquid neural networks with mixture density heads against diffusion policies on Push-T, RoboMimic Can, and PointMaze under a shared-backbone comparison protocol that isolates policy-head effects under matched inputs, training budgets, and evaluation settings. Across tasks, liquid policies use roughly half the parameters (4.3M vs. 8.6M), achieve 2.4x lower offline prediction error, and run 1.8 faster at inference. In sample-efficiency experiments spanning 1% to 46.42% of training data, liquid models remain consistently more robust, with especially large gains in low-data and medium-data regimes. Closed-loop results on Push-T and PointMaze are directionally consistent with offline rankings but noisier, indicating that strong offline density modeling helps deployment while not fully determining closed-loop success. Overall, liquid recurrent multimodal policies provide a compact and practical alternative to iterative denoising for imitation learning.

Liquid Networks with Mixture Density Heads for Efficient Imitation Learning

Abstract

We compare liquid neural networks with mixture density heads against diffusion policies on Push-T, RoboMimic Can, and PointMaze under a shared-backbone comparison protocol that isolates policy-head effects under matched inputs, training budgets, and evaluation settings. Across tasks, liquid policies use roughly half the parameters (4.3M vs. 8.6M), achieve 2.4x lower offline prediction error, and run 1.8 faster at inference. In sample-efficiency experiments spanning 1% to 46.42% of training data, liquid models remain consistently more robust, with especially large gains in low-data and medium-data regimes. Closed-loop results on Push-T and PointMaze are directionally consistent with offline rankings but noisier, indicating that strong offline density modeling helps deployment while not fully determining closed-loop success. Overall, liquid recurrent multimodal policies provide a compact and practical alternative to iterative denoising for imitation learning.

Paper Structure

This paper contains 53 sections, 2 theorems, 12 equations, 8 figures, 2 tables.

Key Result

Theorem 1

Let the target trajectory dynamics be Lipschitz, and let a discrete generator produce outputs with first-order updates over $T$ steps. To achieve terminal-state approximation error at most $\epsilon$, the worst-case step complexity satisfies

Figures (8)

  • Figure 1: Validation negative log-likelihood over 120 epochs for all three datasets. The liquid model (0.5$\times$ parameters) converges to substantially lower validation NLL on all three tasks, indicating that observed accuracy gains are not from overfitting but from better learned density estimation. Both models train stably without early stopping.
  • Figure 2: Sample efficiency for Push-T (left) and PointMaze (right). Solid lines show best-of-10 MSE (left axis, log scale); dotted lines show NLL (right axis). Blue = Liquid + MDN, red = Diffusion. Liquid consistently achieves lower MSE and better NLL at every training fraction, with the largest advantage in the data-scarce regime. NLL trends mirror MSE: liquid's explicit density model extracts more information per demonstration than diffusion's implicit score-matching objective.
  • Figure 3: Best-of-$K$ MSE across all three datasets (here, $K$ is the number of samples drawn for evaluation, not denoising steps). The liquid advantage persists across sample budgets: at $K=10$, liquid models outperform diffusion by $2.4$--$2.5\times$.
  • Figure 4: Per-horizon error analysis across tasks. All models show increasing error toward the end of the 16-step horizon. The liquid head maintains lower per-step error on all three tasks.
  • Figure 5: Qualitative sampled trajectories across all tasks. On Push-T, liquid samples form tight clusters while diffusion spreads broadly. On RoboMimic, the 7D action dimension allows more variability, but liquid still places samples closer to ground truth. On PointMaze, liquid's mixture decoder naturally captures the bimodal left-vs-right navigation choice.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Theorem 1: Sequential complexity lower bound for first-order iterative generators
  • Corollary 2: Linear-system instantiation