Table of Contents
Fetching ...

MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies

Xue Bin Peng, Michael Chang, Grace Zhang, Pieter Abbeel, Sergey Levine

TL;DR

MCP introduces multiplicative compositional policies that allow simultaneous activation of multiple primitive skills, enabling scalable composition of behaviors for high-DoF agents. By pre-training a shared set of Gaussian primitives to imitate diverse motions and then transferring with a learned gating function, MCP achieves strong performance on complex transfer tasks while maintaining expressive, transferable primitives. The approach outperforms additive, hierarchical, and latent-space baselines, particularly as task complexity grows, and reveals clear primitive specializations aligned with gait phases. The work highlights structured exploration and robust skill decomposition as key drivers of success, and suggests future work on temporal abstractions and unsupervised primitive discovery.

Abstract

Humans are able to perform a myriad of sophisticated tasks by drawing upon skills acquired through prior experience. For autonomous agents to have this capability, they must be able to extract reusable skills from past experience that can be recombined in new ways for subsequent tasks. Furthermore, when controlling complex high-dimensional morphologies, such as humanoid bodies, tasks often require coordination of multiple skills simultaneously. Learning discrete primitives for every combination of skills quickly becomes prohibitive. Composable primitives that can be recombined to create a large variety of behaviors can be more suitable for modeling this combinatorial explosion. In this work, we propose multiplicative compositional policies (MCP), a method for learning reusable motor skills that can be composed to produce a range of complex behaviors. Our method factorizes an agent's skills into a collection of primitives, where multiple primitives can be activated simultaneously via multiplicative composition. This flexibility allows the primitives to be transferred and recombined to elicit new behaviors as necessary for novel tasks. We demonstrate that MCP is able to extract composable skills for highly complex simulated characters from pre-training tasks, such as motion imitation, and then reuse these skills to solve challenging continuous control tasks, such as dribbling a soccer ball to a goal, and picking up an object and transporting it to a target location.

MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies

TL;DR

MCP introduces multiplicative compositional policies that allow simultaneous activation of multiple primitive skills, enabling scalable composition of behaviors for high-DoF agents. By pre-training a shared set of Gaussian primitives to imitate diverse motions and then transferring with a learned gating function, MCP achieves strong performance on complex transfer tasks while maintaining expressive, transferable primitives. The approach outperforms additive, hierarchical, and latent-space baselines, particularly as task complexity grows, and reveals clear primitive specializations aligned with gait phases. The work highlights structured exploration and robust skill decomposition as key drivers of success, and suggests future work on temporal abstractions and unsupervised primitive discovery.

Abstract

Humans are able to perform a myriad of sophisticated tasks by drawing upon skills acquired through prior experience. For autonomous agents to have this capability, they must be able to extract reusable skills from past experience that can be recombined in new ways for subsequent tasks. Furthermore, when controlling complex high-dimensional morphologies, such as humanoid bodies, tasks often require coordination of multiple skills simultaneously. Learning discrete primitives for every combination of skills quickly becomes prohibitive. Composable primitives that can be recombined to create a large variety of behaviors can be more suitable for modeling this combinatorial explosion. In this work, we propose multiplicative compositional policies (MCP), a method for learning reusable motor skills that can be composed to produce a range of complex behaviors. Our method factorizes an agent's skills into a collection of primitives, where multiple primitives can be activated simultaneously via multiplicative composition. This flexibility allows the primitives to be transferred and recombined to elicit new behaviors as necessary for novel tasks. We demonstrate that MCP is able to extract composable skills for highly complex simulated characters from pre-training tasks, such as motion imitation, and then reuse these skills to solve challenging continuous control tasks, such as dribbling a soccer ball to a goal, and picking up an object and transporting it to a target location.

Paper Structure

This paper contains 32 sections, 18 equations, 9 figures, 11 tables, 1 algorithm.

Figures (9)

  • Figure 1: Our method is evaluated on complex 3D characters with different morphologies and large numbers of degrees-of-freedom.
  • Figure 2: The transfer tasks pose a challenging combination of locomotion and object manipulation, such as carrying an object to a target location and dribbling a ball to a goal, which requires coordination of multiple body parts and temporally extended behaviors.
  • Figure 3: Schematic illustrations of the MCP architecture. The gating function receives both $s$ and $g$ as inputs, which are first encoded by separate networks, with 512 and 256 units. The resulting features are concatenated and processed with a layer of 256 units, followed by a sigmoid output layer to produce the weights $w(s, g)$. The primitives receive only $s$ as input, which is first processed by a common network, with 512 and 256 units, before branching into separate layers of 256 units for each primitive, followed by a linear output layer that produces $\mu_i(s)$ and $\Sigma_i(s)$ for each primitive. ReLU activation is used for all hidden units.
  • Figure 4: Learning curves of the various models when applied to transfer tasks. MCP substantially improves learning speed and performance on challenging tasks (e.g. carry and dribble), and is the only method that succeeds on the most difficult task (Dribble: T-Rex).
  • Figure 5: Left: Learning curves on holdout tasks in the Ant environment. Right: Trajectories produced by models with target directions from pre-training, and target directions from the holdout set after training on transfer tasks. The latent space model is prone to overfitting to the pre-training tasks, and can struggle to adapt to the holdout tasks.
  • ...and 4 more figures