What Do Latent Action Models Actually Learn?

Chuheng Zhang; Tim Pearce; Pushi Zhang; Kaixin Wang; Xiaoyu Chen; Wei Shen; Li Zhao; Jiang Bian

What Do Latent Action Models Actually Learn?

Chuheng Zhang, Tim Pearce, Pushi Zhang, Kaixin Wang, Xiaoyu Chen, Wei Shen, Li Zhao, Jiang Bian

TL;DR

The paper analyzes Latent Action Models by introducing a tractable linear abstraction and showing that, under reasonable assumptions, learning reduces to PCA on the sum of controllable changes and exogenous noise. It then elucidates how data collection policy and noise structure affect the learned latent, and demonstrates practical remedies—data augmentation and auxiliary action prediction—to improve alignment with true actions. The authors validate their theoretical insights with nonlinear experiments on a small grid-world, offering actionable guidance for designing LAM datasets and training procedures to ensure latent semantics reflect controllable changes rather than noise. Overall, the work provides a principled lens on LAM learnability and concrete strategies to enhance their reliability in unsupervised pretraining for embodied AI.

Abstract

Latent action models (LAMs) aim to learn action-relevant changes from unlabeled videos by compressing changes between frames as latents. However, differences between video frames can be caused by controllable changes as well as exogenous noise, leading to an important concern -- do latents capture the changes caused by actions or irrelevant noise? This paper studies this issue analytically, presenting a linear model that encapsulates the essence of LAM learning, while being tractable.This provides several insights, including connections between LAM and principal component analysis (PCA), desiderata of the data-generating policy, and justification of strategies to encourage learning controllable changes using data augmentation, data cleaning, and auxiliary action-prediction. We also provide illustrative results based on numerical simulation, shedding light on the specific structure of observations, actions, and noise in data that influence LAM learning.

What Do Latent Action Models Actually Learn?

TL;DR

Abstract

What Do Latent Action Models Actually Learn?

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (5)