Table of Contents
Fetching ...

CAIMAN: Causal Action Influence Detection for Sample-efficient Loco-manipulation

Yuanchen Yuan, Jin Cheng, Núria Armengol Urpí, Stelian Coros

TL;DR

CAIMAN tackles the challenge of sample-efficient, non-prehensile loco-manipulation in legged robots by introducing a CAI-based intrinsic reward within a hierarchical RL framework. The method combines a simple kinematic prior with a learned residual to form a dynamics model used to compute CAI, guiding exploration toward controllable interactions with objects. Empirical results in simulation show improved sample efficiency and robust obstacle navigation, with successful zero-shot sim-to-real transfer on a real quadruped; pretrained dynamics further boost learning speed for new tasks. Overall, CAIMAN provides a scalable approach to endow legged robots with whole-body pushing capabilities in unstructured environments, reducing reliance on dense rewards and hand-crafted curricula.

Abstract

Enabling legged robots to perform non-prehensile loco-manipulation is crucial for enhancing their versatility. Learning behaviors such as whole-body object pushing often requires sophisticated planning strategies or extensive task-specific reward shaping, especially in unstructured environments. In this work, we present CAIMAN, a practical reinforcement learning framework that encourages the agent to gain control over other entities in the environment. CAIMAN leverages causal action influence as an intrinsic motivation objective, allowing legged robots to efficiently acquire object pushing skills even under sparse task rewards. We employ a hierarchical control strategy, combining a low-level locomotion module with a high-level policy that generates task-relevant velocity commands and is trained to maximize the intrinsic reward. To estimate causal action influence, we learn the dynamics of the environment by integrating a kinematic prior with data collected during training.We empirically demonstrate CAIMAN's superior sample efficiency and adaptability to diverse scenarios in simulation, as well as its successful transfer to real-world systems without further fine-tuning.

CAIMAN: Causal Action Influence Detection for Sample-efficient Loco-manipulation

TL;DR

CAIMAN tackles the challenge of sample-efficient, non-prehensile loco-manipulation in legged robots by introducing a CAI-based intrinsic reward within a hierarchical RL framework. The method combines a simple kinematic prior with a learned residual to form a dynamics model used to compute CAI, guiding exploration toward controllable interactions with objects. Empirical results in simulation show improved sample efficiency and robust obstacle navigation, with successful zero-shot sim-to-real transfer on a real quadruped; pretrained dynamics further boost learning speed for new tasks. Overall, CAIMAN provides a scalable approach to endow legged robots with whole-body pushing capabilities in unstructured environments, reducing reliance on dense rewards and hand-crafted curricula.

Abstract

Enabling legged robots to perform non-prehensile loco-manipulation is crucial for enhancing their versatility. Learning behaviors such as whole-body object pushing often requires sophisticated planning strategies or extensive task-specific reward shaping, especially in unstructured environments. In this work, we present CAIMAN, a practical reinforcement learning framework that encourages the agent to gain control over other entities in the environment. CAIMAN leverages causal action influence as an intrinsic motivation objective, allowing legged robots to efficiently acquire object pushing skills even under sparse task rewards. We employ a hierarchical control strategy, combining a low-level locomotion module with a high-level policy that generates task-relevant velocity commands and is trained to maximize the intrinsic reward. To estimate causal action influence, we learn the dynamics of the environment by integrating a kinematic prior with data collected during training.We empirically demonstrate CAIMAN's superior sample efficiency and adaptability to diverse scenarios in simulation, as well as its successful transfer to real-world systems without further fine-tuning.

Paper Structure

This paper contains 23 sections, 7 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Illustration of the LCM (left) for two different environment situations $S=s$ (right) in the loco-manipulation task. The LCM captures the transition from $S,A$ to $S'$, factorized into state components. While the global SCM is fully connected (dashed and continuous lines), the LCM $\mathcal{G}_{S=s}$ (continuous lines) is causally minimal. We are interested in detecting the presence of continuous orange arrows in the LCM, i.e. the influence of the action $A$ on next states $S'$.
  • Figure 2: CAIMAN framework: The high-level (HL) policy generates desired base velocity commands, which are translated into joint commands by a low-level (LL) policy. We utilize a simple kinematic prior and learned residual dynamics to model the robot-object interaction in the environment while providing a CAI-based explorative bonus along with the sparse task reward.
  • Figure 3: Illustrations for the Single-object (left), Single-wall (middle), and Multi-wall (right) tasks. The yellow sphere denotes the object's target position.
  • Figure 4: Policy success rate evaluated at every 100 training iterations for all methods and tasks. Results are evaluated across 800 episodes and averaged over 3 seeds, shaded area represents standard deviation.
  • Figure 5: Single wall task with target positions randomly sampled within an area in front of the wall.
  • ...and 4 more figures

Theorems & Definitions (2)

  • Definition 1: Structural Causal Model pearl2009causality
  • Definition 2: Local Causal Model pitis2020counterfactual