Table of Contents
Fetching ...

LAR-MoE: Latent-Aligned Routing for Mixture of Experts in Robotic Imitation Learning

Ariel Rodriguez, Chenpan Li, Lorenzo Mazza, Rayan Younis, Ortrun Hellig, Sebastian Bodenstedt, Martin Wagner, Stefanie Speidel

TL;DR

This work introduces Latent-Aligned Routing for Mixture of Experts (LAR-MoE), a two-stage framework that decouples unsupervised skill discovery from policy learning and suggests that latent-aligned routing provides a principled alternative to supervised skill decomposition.

Abstract

Imitation learning enables robots to acquire manipulation skills from demonstrations, yet deploying a policy across tasks with heterogeneous dynamics remains challenging, as models tend to average over distinct behavioral modes present in the demonstrations. Mixture-of-Experts (MoE) architectures address this by activating specialized subnetworks, but requires meaningful skill decompositions for expert routing. We introduce Latent-Aligned Routing for Mixture of Experts (LAR-MoE), a two-stage framework that decouples unsupervised skill discovery from policy learning. In pre-training, we learn a joint latent representation between observations and future actions through student-teacher co-training. In a post-training stage, the expert routing is regularized to follow the structure of the learned latent space, preventing expert collapse while maintaining parameter efficiency. We evaluate LAR-MoE in simulation and on hardware. On the LIBERO benchmark, our method achieves a 95.2% average success rate with 150M parameters. On a surgical bowel grasping and retraction task, LAR-MoE matches a supervised MoE baseline without requiring any phase annotations, and transfers zero-shot to ex vivo porcine tissue. Our findings suggest that latent-aligned routing provides a principled alternative to supervised skill decomposition, enabling structured expert specialization from unlabeled demonstrations.

LAR-MoE: Latent-Aligned Routing for Mixture of Experts in Robotic Imitation Learning

TL;DR

This work introduces Latent-Aligned Routing for Mixture of Experts (LAR-MoE), a two-stage framework that decouples unsupervised skill discovery from policy learning and suggests that latent-aligned routing provides a principled alternative to supervised skill decomposition.

Abstract

Imitation learning enables robots to acquire manipulation skills from demonstrations, yet deploying a policy across tasks with heterogeneous dynamics remains challenging, as models tend to average over distinct behavioral modes present in the demonstrations. Mixture-of-Experts (MoE) architectures address this by activating specialized subnetworks, but requires meaningful skill decompositions for expert routing. We introduce Latent-Aligned Routing for Mixture of Experts (LAR-MoE), a two-stage framework that decouples unsupervised skill discovery from policy learning. In pre-training, we learn a joint latent representation between observations and future actions through student-teacher co-training. In a post-training stage, the expert routing is regularized to follow the structure of the learned latent space, preventing expert collapse while maintaining parameter efficiency. We evaluate LAR-MoE in simulation and on hardware. On the LIBERO benchmark, our method achieves a 95.2% average success rate with 150M parameters. On a surgical bowel grasping and retraction task, LAR-MoE matches a supervised MoE baseline without requiring any phase annotations, and transfers zero-shot to ex vivo porcine tissue. Our findings suggest that latent-aligned routing provides a principled alternative to supervised skill decomposition, enabling structured expert specialization from unlabeled demonstrations.
Paper Structure (13 sections, 8 equations, 6 figures, 2 tables)

This paper contains 13 sections, 8 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Overview of the proposed LAR-MoE framework. Unstructured demonstrations are first used to learn a task-aware latent space. The learned latent structure then guides expert routing, encouraging specialization while preventing expert collapse. This latent-aligned routing enables the emergence of structured robotic behavior through implicit task-phase understanding without requiring explicit task phase annotations.
  • Figure 2: Overview of LAR-MoE. In pre-training, our method learns a joint latent representation of observations and future actions via student–teacher co-training. In post-training, we anchors expert routing to the learned latent structure by freezing the student model and using a regularization strategy. The action experts are implemented using a simple transformer decoder architecture. The language promts are encoded using MiniLM-L6.
  • Figure 3: Expert activation over time during an ex-vivo bowel retraction rollout. The heatmap displays the activation weights of all experts at each timestep ($\approx 33\,\mathrm{ms}$ temporal resolution). The color bar labeled Po denotes the expert with the highest activation at each timestep chosen by the router network, illustrating the temporal specialization and switching behavior of the mixture-of-experts policy. The color bar labeled Hu represents a human-annotated task phase segmentation of the same rollout.
  • Figure 4: Ablation studies on the LIBERO benchmark liu2023libero. (a) Effect of freezing the student encoder (+F) and applying latent-alignment regularization (+R) on LAR-MoE16 success rate. (b) Success rate as a function of expert count.
  • Figure 5: Illustration of the surgical bowel grasping and retraction task. The first frame shows the robot instrument, awaiting the start of the task. The remaining figures correspond to individual phases, numbered in the upper-left corner. The robot-controlled instrument is labeled in white (Robot) and the surgeon-operated instrument in yellow (Surgeon). Phase 1: the surgeon indicates the target grasping region. Phase 2: the robot grasps the indicated bowel segment. Phase 3: the robot holds its position while the surgeon grasps the opposite end. Phase 4 is not shown, as it represents the transitional stretching motion between Phase 3 and Phase 5. Phase 5: the robot maintains tension throughout the remainder of the procedure.
  • ...and 1 more figures