Table of Contents
Fetching ...

ECoDe: A Sample-Efficient Method for Co-Design of Robotic Agents

Kishan R. Nagiredla, Buddhika L. Semage, Arun Kumar A., Thommen G. Karimpanal, Santu Rana

TL;DR

This work proposes a multi-fidelity-based exploration strategy in which controllers are tied across the design spaces through a universal policy learner for warm-starting subsequent controller learning problems, demonstrating the superiority of this method compared to baselines.

Abstract

Co-designing autonomous robotic agents involves simultaneously optimizing the controller and physical design of the agent. Its inherent bi-level optimization formulation necessitates an outer loop design optimization driven by an inner loop control optimization. This can be challenging when the design space is large and each design evaluation involves a data-intensive reinforcement learning process for control optimization. To improve the sample efficiency of co-design, we propose a multi-fidelity-based exploration strategy in which we tie the controllers learned across the design spaces through a universal policy learner for warm-starting subsequent controller learning problems. Experiments performed on a wide range of agent design problems demonstrate the superiority of our method compared to baselines. Additionally, analysis of the optimized designs shows interesting design alterations, including design simplifications and non-intuitive alterations.

ECoDe: A Sample-Efficient Method for Co-Design of Robotic Agents

TL;DR

This work proposes a multi-fidelity-based exploration strategy in which controllers are tied across the design spaces through a universal policy learner for warm-starting subsequent controller learning problems, demonstrating the superiority of this method compared to baselines.

Abstract

Co-designing autonomous robotic agents involves simultaneously optimizing the controller and physical design of the agent. Its inherent bi-level optimization formulation necessitates an outer loop design optimization driven by an inner loop control optimization. This can be challenging when the design space is large and each design evaluation involves a data-intensive reinforcement learning process for control optimization. To improve the sample efficiency of co-design, we propose a multi-fidelity-based exploration strategy in which we tie the controllers learned across the design spaces through a universal policy learner for warm-starting subsequent controller learning problems. Experiments performed on a wide range of agent design problems demonstrate the superiority of our method compared to baselines. Additionally, analysis of the optimized designs shows interesting design alterations, including design simplifications and non-intuitive alterations.
Paper Structure (18 sections, 1 equation, 4 figures, 4 tables, 1 algorithm)

This paper contains 18 sections, 1 equation, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: ECoDe Architecture for multi-fidelity based knowledge propagation mechanism to identify the best co-design (light blue blob). The blobs indicate different robot design samples and hatched boxes indicate the training time. Inside each horizontal box (light green), top-performing samples (red blobs) are made to progress from lower fidelities to higher fidelities. The thick blue arrows indicate the evaluation order to aid effective knowledge transfer through UPN.
  • Figure 2: OpenAI Gym Environments (left to right) - CartPole, Acrobot, Hopper, Walker2D, Ant and Humanoid.
  • Figure 3: The original Acrobot (left) vs. ECoDe simplified Acrobot (right). The simplified version resembles a pendulum, making the control problem easier.
  • Figure 4: Ant robot with a broken front left limb (left, blue box) and the ECoDe-suggested co-design (right) with a shortened front right limb (orange box) and lengthened hind limbs (yellow and grey boxes), resembling a kangaroo rat.