Table of Contents
Fetching ...

Learning Getting-Up Policies for Real-World Humanoid Robots

Xialin He, Runpei Dong, Zixuan Chen, Saurabh Gupta

TL;DR

The paper tackles humanoid fall recovery by learning two-stage getting-up and rolling-over policies with a curriculum and sim-to-real transfer. It introduces HumanUP, where a discovery stage finds motion trajectories under weak deployment constraints, followed by a deployable stage that imitates these trajectories under strong regularization and domain randomization to ensure real-world reliability. Real-world experiments on a Unitree G1 demonstrate higher success rates and smoother, safer motions than hand-designed controllers, validating the approach across supine and prone poses on varied terrains. The work highlights the importance of curriculum design, full collision modeling, posture randomization, and soft symmetry for achieving deployable, generalizable policies in contact-rich tasks. This framework advances practical autonomous fall recovery for human-sized humanoids and suggests broader applicability to other complex contact-driven behaviors.

Abstract

Automatic fall recovery is a crucial prerequisite before humanoid robots can be reliably deployed. Hand-designing controllers for getting up is difficult because of the varied configurations a humanoid can end up in after a fall and the challenging terrains humanoid robots are expected to operate on. This paper develops a learning framework to produce controllers that enable humanoid robots to get up from varying configurations on varying terrains. Unlike previous successful applications of learning to humanoid locomotion, the getting-up task involves complex contact patterns (which necessitates accurately modeling of the collision geometry) and sparser rewards. We address these challenges through a two-phase approach that induces a curriculum. The first stage focuses on discovering a good getting-up trajectory under minimal constraints on smoothness or speed / torque limits. The second stage then refines the discovered motions into deployable (i.e. smooth and slow) motions that are robust to variations in initial configuration and terrains. We find these innovations enable a real-world G1 humanoid robot to get up from two main situations that we considered: a) lying face up and b) lying face down, both tested on flat, deformable, slippery surfaces and slopes (e.g., sloppy grass and snowfield). This is one of the first successful demonstrations of learned getting-up policies for human-sized humanoid robots in the real world.

Learning Getting-Up Policies for Real-World Humanoid Robots

TL;DR

The paper tackles humanoid fall recovery by learning two-stage getting-up and rolling-over policies with a curriculum and sim-to-real transfer. It introduces HumanUP, where a discovery stage finds motion trajectories under weak deployment constraints, followed by a deployable stage that imitates these trajectories under strong regularization and domain randomization to ensure real-world reliability. Real-world experiments on a Unitree G1 demonstrate higher success rates and smoother, safer motions than hand-designed controllers, validating the approach across supine and prone poses on varied terrains. The work highlights the importance of curriculum design, full collision modeling, posture randomization, and soft symmetry for achieving deployable, generalizable policies in contact-rich tasks. This framework advances practical autonomous fall recovery for human-sized humanoids and suggests broader applicability to other complex contact-driven behaviors.

Abstract

Automatic fall recovery is a crucial prerequisite before humanoid robots can be reliably deployed. Hand-designing controllers for getting up is difficult because of the varied configurations a humanoid can end up in after a fall and the challenging terrains humanoid robots are expected to operate on. This paper develops a learning framework to produce controllers that enable humanoid robots to get up from varying configurations on varying terrains. Unlike previous successful applications of learning to humanoid locomotion, the getting-up task involves complex contact patterns (which necessitates accurately modeling of the collision geometry) and sparser rewards. We address these challenges through a two-phase approach that induces a curriculum. The first stage focuses on discovering a good getting-up trajectory under minimal constraints on smoothness or speed / torque limits. The second stage then refines the discovered motions into deployable (i.e. smooth and slow) motions that are robust to variations in initial configuration and terrains. We find these innovations enable a real-world G1 humanoid robot to get up from two main situations that we considered: a) lying face up and b) lying face down, both tested on flat, deformable, slippery surfaces and slopes (e.g., sloppy grass and snowfield). This is one of the first successful demonstrations of learned getting-up policies for human-sized humanoid robots in the real world.

Paper Structure

This paper contains 49 sections, 2 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: HumanUP system overview. Our getting-up policy (\ref{['sec:policy']}) is trained in simulation using two-stage RL training, after which it is directly deployed in the real world. (a) Stage I (\ref{['sec:stageI']}) learns a discovery policy $f$ that figures out a getting-up trajectory with minimal deployment constraints. (b) Stage II (\ref{['sec:stageII']}) converts the trajectory discovered by Stage I into a policy $\pi$ that is deployable, robust, and generalizable. This policy $\pi$ is trained by learning to track a slowed down version of the discovered trajectory under strong control regularization on varied terrains and from varied initial poses. (c) The two-stage training induces a curriculum (\ref{['sec:curriculum']}). Stage I targets motion discovery in easier settings (simpler collision geometry, same starting poses, weak regularization, no variations in terrain), while Stage II solves the task of making the learned motion deployable and generalizable.
  • Figure 2: Real-world results. We evaluate HumanUP (ours) in several real setups that span diverse surface properties, including both man-made and natural surfaces, and cover a wide range of roughness (rough concrete to slippery snow), bumpiness (flat concrete to tiles), ground compliance (completely firm concrete to being swampy muddy grass), and slope (flat to about $10^\circ$). We compare HumanUP with G1's manufacturer-provided controller and HumanUP w/o posture randomization (PR). HumanUP succeeds more consistently (78.3% vs 41.7%) and can solve terrains that the manufacturer-provided controller can't.
  • Figure 3: Learning curve. (a) Termination height of the torso, indicating whether the robot can lift the body. (b) Body uprightness, computed as the projected gravity on the $z$-axis, normalized to $[0,1]$ for better comparison. The overall number of simulation sampling steps is about 5B, normalized to $[0,1]$.
  • Figure 4: Getting up execution comparison with G1's manufacturer-provided controller. The manufacturer-provided controller uses a handcrafted motion trajectory, which can be divided into three phases, while our HumanUP learns a continuous and more efficient whole-body getting-up motion. Our HumanUP enables the humanoid to get up within 6 seconds, half of the manufacturer-provided controller's 11 seconds of control. (a), (b), and (c) record the corresponding mean motor temperature of the upper body, lower body, and waist, respectively. G1's manufacturer-provided controller's execution causes the arm motors to heat up significantly, whereas our policy makes more use of the leg motors that are stronger (higher torque limit of 83N as opposed to 25N for the arm motors) and thus able to take more load.
  • Figure 5: Qualitative examples of failure modes on grass slope and snow field. G1’s manufacturer-provided controller isn't able to squat on the sloping grass and slips on the slope. HumanUP policy can partially get up on both the slope and the snow, but falls due to unstable foot placement on the slope and slippage on the snow.
  • ...and 2 more figures