Robots that redesign themselves through kinematic self-destruction

Chen Yu; Sam Kriegman

Robots that redesign themselves through kinematic self-destruction

Chen Yu, Sam Kriegman

Abstract

Every robot built to date was predesigned by an external process, prior to deployment. Here we show a robot that actively participates in its own design during its lifetime. Starting from a randomly assembled body, and using only proprioceptive feedback, the robot dynamically ``sculpts'' itself into a new design through kinematic self-destruction: identifying redundant links within its body that inhibit its locomotion, and then thrashing those links against the surface until they break at the joint and fall off the body. It does so using a single autoregressive sequence model, a universal controller that learns in simulation when and how to simplify a robot's body through self-destruction and then adaptively controls the reduced morphology. The optimized policy successfully transfers to reality and generalizes to previously unseen kinematic trees, generating forward locomotion that is more effective than otherwise equivalent policies that randomly remove links or cannot remove any. This suggests that self-designing robots may be more successful than predesigned robots in some cases, and that kinematic self-destruction, though reductive and irreversible, could provide a general adaptive strategy for a wide range of robots.

Robots that redesign themselves through kinematic self-destruction

Abstract

Paper Structure (18 sections, 10 equations, 5 figures, 1 table)

This paper contains 18 sections, 10 equations, 5 figures, 1 table.

Introduction
Preliminaries
Learning self-destruction
Observation space
Reward function
Transformer policy
Problem formulation
Model architecture
Training
Prompt Reset
Sim-to-real grounding via real-world rollouts
Experimental setup
Results
In-distribution performance in simulation
Out-of-distribution performance in simulation
...and 3 more sections

Figures (5)

Figure 1: Training dataset and transformer architecture. A causal transformer was trained on sensorimotor trajectories collected from eight pre-designed robots (A-H). Each trajectory was flattened into a sequence of per-module states $\mathbf{s}^i_t$ and actions $\mathbf{a}_t$. These tokens are processed by the transformer to autoregressively predict the next action, conditioned on the entire state-action history. Inheriting from the expert policies, the output action sequence from the transformer should include a self destruction phase (I) and a self movement phase (J).
Figure 2: In-distribution performance in simulation. The policy capable of kinematic self-destruction, in which the expert controller is allowed to autonomously select which module to break, is compared against an otherwise equivalent baseline in which a different module is randomly chosen for removal during expert training and rollout collection. Both sets of rollouts are used to train a transformer policy via identical training pipelines. (A-P:) For each test morphology, top-down trajectories are visualized in time (from cyan to pink) and mean distance traveled (± std) is displayed, across five independent trials. (Q:) Automatically chosen detachments yield better locomotion in terms of mean displacement across these eight in-distribution robots ($p = 0.033$; one-sided paired $t$-test).
Figure 3: Out-of-distribution generalization in simulation. Nine of the 100 previously-unseen morphologies were randomly sampled from the test set (A-I). All 100 test morphologies comprise four modules arranged in a unique configuration, posing diverse control challenges. The mean locomotion speed across all 100 test robots, comparing the policy that uses kinematic self destruction to redesign itself before locomotion (red) against a baseline that does not self-destruct (but is otherwise equivalent; green). For both methods, the reported speed corresponds to the best 10-second segment of the rollout. Self destruction facilitated significantly higher mean locomotion speed, indicating a better generalization to novel morphologies.
Figure 4: A closer look at kinematic self-destruction. An arbitrary morphology (this one was in the training set) was assembled (A). Modules were glued together to form a breakable bond (B). The policy performs a closed-loop maneuver (with proprioceptive feedback), which in this case consisted of three consecutive pushes (C–E) in which the policy lifted and swung its "tail" to exert torque on a specific glued joint. It took 12 seconds to complete this self-destruction (F), after which the same policy produces forward locomotion (not pictured) in its redesigned morphology.
Figure 5: Out-of-distribution testing in the real world. Three previously unseen morphologies were assembled and the proposed method was once again compared against the baseline policy, which lacks the ability to self destruct. For each case, we show the motion sequence and the torso trajectory for 20 seconds captured via OptiTrack, with color indicating time (from cyan to pink) and net displacement labeled below the trajectory. In the first morphology we tested (A-H), although the baseline policy produces a more displacement (F-H), the policy with the ability to self destruct produces a more directional trajectory (B-D); for the last two robots (I-P and Q-X), self destruction consistently redesigns the robot into one with better locomotion performance.

Robots that redesign themselves through kinematic self-destruction

Abstract

Robots that redesign themselves through kinematic self-destruction

Authors

Abstract

Table of Contents

Figures (5)