Self-Predictive Representations for Combinatorial Generalization in Behavioral Cloning
Daniel Lawson, Adriana Hugessen, Charlotte Cloutier, Glen Berseth, Khimya Khetarpal
TL;DR
The paper addresses the challenge of combinatorial generalization in goal-conditioned behavioral cloning by linking temporal coherence in representations to the successor representation. It introduces BYOL-$\gamma$, a self-predictive objective that samples future states with $k \sim \mathrm{Geom}(1-\gamma)$ to capture long-range dynamics and approximates $\tilde{M}^{\pi}$ via a low-rank decomposition $\tilde{M}^{\pi} \approx \Phi \Psi \Phi^T$. The authors show theoretically that, in finite MDPs with linear features, BYOL-$\gamma$ approximates SR and empirically demonstrate improved zero-shot combinatorial generalization on OGBench, often outperforming baseline BC and TD-based methods, with ablations clarifying the impact of key components. The work highlights the practical value of meaningful representations as auxiliary objectives for scaling offline, goal-conditioned policies to longer horizons and more complex environments.
Abstract
While goal-conditioned behavior cloning (GCBC) methods can perform well on in-distribution training tasks, they do not necessarily generalize zero-shot to tasks that require conditioning on novel state-goal pairs, i.e. combinatorial generalization. In part, this limitation can be attributed to a lack of temporal consistency in the state representation learned by BC; if temporally correlated states are properly encoded to similar latent representations, then the out-of-distribution gap for novel state-goal pairs would be reduced. We formalize this notion by demonstrating how encouraging long-range temporal consistency via successor representations (SR) can facilitate generalization. We then propose a simple yet effective representation learning objective, $\text{BYOL-}γ$ for GCBC, which theoretically approximates the successor representation in the finite MDP case through self-predictive representations, and achieves competitive empirical performance across a suite of challenging tasks requiring combinatorial generalization.
