Table of Contents
Fetching ...

Learning Arm-Assisted Fall Damage Reduction and Recovery for Legged Mobile Manipulators

Yuntao Ma, Farbod Farshidian, Marco Hutter

TL;DR

The paper tackles fall damage and recovery for legged mobile manipulators by learning an arm-assisted recovery policy. It introduces an asymmetric actor-critic framework with time-varying task rewards within a finite-horizon MDP, enabling a time-invariant policy that uses the arm to reduce impact and aid self-righting. In simulation and on ALMA hardware, the approach reduces base impulse, base acceleration, and peak joint forces during falls, and achieves a high fall-recovery success rate (98.9%) with notable leg-torque savings compared to arm-tugged baselines. The method demonstrates robustness to reward scaling, adaptability to additional tasks (resting, self-righting), and practical sim-to-real transfer, advancing the deployability of legged mobile manipulators with payloads.

Abstract

Adaptive falling and recovery skills greatly extend the applicability of robot deployments. In the case of legged mobile manipulators, the robot arm could adaptively stop the fall and assist the recovery. Prior works on falling and recovery strategies for legged mobile manipulators usually rely on assumptions such as inelastic collisions and falling in defined directions to enable real-time computation. This paper presents a learning-based approach to reducing fall damage and recovery. An asymmetric actor-critic training structure is used to train a time-invariant policy with time-varying reward functions. In simulated experiments, the policy recovers from 98.9\% of initial falling configurations. It reduces base contact impulse, peak joint internal forces, and base acceleration during the fall compared to the baseline methods. The trained control policy is deployed and extensively tested on the ALMA robot hardware. A video summarizing the proposed method and the hardware tests is available at https://youtu.be/avwg2HqGi8s.

Learning Arm-Assisted Fall Damage Reduction and Recovery for Legged Mobile Manipulators

TL;DR

The paper tackles fall damage and recovery for legged mobile manipulators by learning an arm-assisted recovery policy. It introduces an asymmetric actor-critic framework with time-varying task rewards within a finite-horizon MDP, enabling a time-invariant policy that uses the arm to reduce impact and aid self-righting. In simulation and on ALMA hardware, the approach reduces base impulse, base acceleration, and peak joint forces during falls, and achieves a high fall-recovery success rate (98.9%) with notable leg-torque savings compared to arm-tugged baselines. The method demonstrates robustness to reward scaling, adaptability to additional tasks (resting, self-righting), and practical sim-to-real transfer, advancing the deployability of legged mobile manipulators with payloads.

Abstract

Adaptive falling and recovery skills greatly extend the applicability of robot deployments. In the case of legged mobile manipulators, the robot arm could adaptively stop the fall and assist the recovery. Prior works on falling and recovery strategies for legged mobile manipulators usually rely on assumptions such as inelastic collisions and falling in defined directions to enable real-time computation. This paper presents a learning-based approach to reducing fall damage and recovery. An asymmetric actor-critic training structure is used to train a time-invariant policy with time-varying reward functions. In simulated experiments, the policy recovers from 98.9\% of initial falling configurations. It reduces base contact impulse, peak joint internal forces, and base acceleration during the fall compared to the baseline methods. The trained control policy is deployed and extensively tested on the ALMA robot hardware. A video summarizing the proposed method and the hardware tests is available at https://youtu.be/avwg2HqGi8s.
Paper Structure (23 sections, 1 equation, 8 figures, 2 tables)

This paper contains 23 sections, 1 equation, 8 figures, 2 tables.

Figures (8)

  • Figure 1: ALMA robot recovering from a fall.
  • Figure 2: Overview of the training pipeline. The actor observes minimal observation required for completing the fall recovery, and the critic has access to privileged information that improves the value function estimation. The actor's output is converted to the joint position targets for the robot's joint PD controller.
  • Figure 3: \ref{['subfig:tripping1']}-\ref{['subfig:tripping5']} Possible initial conditions for the policy training. Our pipeline expects the task controllers to detect and report the fall, and the policy is trained to reduce the damage after switching the controller. \ref{['subfig:default_joint_pos']} The default joint configuration for the policy training.
  • Figure 4: ALMA robot adapting the fall recovery strategy after an unsuccessful quick recovery attempt.
  • Figure 5: Left: Distribution of contact impulse on the base over all time steps. Time steps with base contact impulse below [0.05]Ns are not included. Right: Base acceleration during the fall.
  • ...and 3 more figures