Table of Contents
Fetching ...

Efficient Morphology-Control Co-Design via Stackelberg Proximal Policy Optimization

Yanning Dai, Yuhui Wang, Dylan R. Ashley, Jürgen Schmidhuber

Abstract

Morphology-control co-design concerns the coupled optimization of an agent's body structure and control policy. This problem exhibits a bi-level structure, where the control dynamically adapts to the morphology to maximize performance. Existing methods typically neglect the control's adaptation dynamics by adopting a single-level formulation that treats the control policy as fixed when optimizing morphology. This can lead to inefficient optimization, as morphology updates may be misaligned with control adaptation. In this paper, we revisit the co-design problem from a game-theoretic perspective, modeling the intrinsic coupling between morphology and control as a novel variant of a Stackelberg game. We propose Stackelberg Proximal Policy Optimization (Stackelberg PPO), which explicitly incorporates the control's adaptation dynamics into morphology optimization. By modeling this intrinsic coupling, our method aligns morphology updates with control adaptation, thereby stabilizing training and improving learning efficiency. Experiments across diverse co-design tasks demonstrate that Stackelberg PPO outperforms standard PPO in both stability and final performance, opening the way for dramatically more efficient robotics designs.

Efficient Morphology-Control Co-Design via Stackelberg Proximal Policy Optimization

Abstract

Morphology-control co-design concerns the coupled optimization of an agent's body structure and control policy. This problem exhibits a bi-level structure, where the control dynamically adapts to the morphology to maximize performance. Existing methods typically neglect the control's adaptation dynamics by adopting a single-level formulation that treats the control policy as fixed when optimizing morphology. This can lead to inefficient optimization, as morphology updates may be misaligned with control adaptation. In this paper, we revisit the co-design problem from a game-theoretic perspective, modeling the intrinsic coupling between morphology and control as a novel variant of a Stackelberg game. We propose Stackelberg Proximal Policy Optimization (Stackelberg PPO), which explicitly incorporates the control's adaptation dynamics into morphology optimization. By modeling this intrinsic coupling, our method aligns morphology updates with control adaptation, thereby stabilizing training and improving learning efficiency. Experiments across diverse co-design tasks demonstrate that Stackelberg PPO outperforms standard PPO in both stability and final performance, opening the way for dramatically more efficient robotics designs.
Paper Structure (38 sections, 3 theorems, 28 equations, 16 figures, 16 tables)

This paper contains 38 sections, 3 theorems, 28 equations, 16 figures, 16 tables.

Key Result

Theorem 1

We define the surrogate

Figures (16)

  • Figure 1: Showcasing how our Stackelberg PPO can autonomously design task-specific robots for the "Pusher" task, starting with a bare-bones structure and ultimately evolving into a sophisticated design with arm-like structures for pushing boxes and leg-like limbs for movement. This highlights the method’s ability to create adaptive and complex designs. In comparison, the traditional PPO method generates simpler structures that can’t support more complex behaviors. For more examples of evolved designs and animations, as well as open-source code, visit: https://yanningdai.github.io/stackelberg-ppo-co-design.
  • Figure 2: Illustration of the phase-separated Stackelberg Markov Game for morphology–control co-design. In the leader phase (blue part), the agent incrementally edits the morphology via discrete topology-altering actions, producing a terminal morphology $s_T^L$. In the follower phase (green part), the control policy is optimized based on this morphology.
  • Figure 3: Performance curve with respect to the number of follower steps during training. Shaded regions denote standard error across seven random seeds.
  • Figure 4: (a) Evolved morphologies. Ablation studies on (b) $\lambda$ sweep from 0 to $\infty$ and (c) Fisher information matrix on/off, both evaluated on Stepper-Regular task.
  • Figure 5: (a) Reward curves and (b) KL-divergence traces for different clipping thresholds $\epsilon$. (c) Performance comparison under varying leader horizons $T$. All evaluated on Stepper-Regular.
  • ...and 11 more figures

Theorems & Definitions (7)

  • Definition 1
  • Theorem 1
  • Proposition 1
  • proof : Proof of \ref{['theorem_surrogate__cross_derivative']}
  • proof : Proof of \ref{['theorem_first_order']}
  • Proposition 2
  • proof : Proof of \ref{['theorem_second_derivative']}