Table of Contents
Fetching ...

Learning Bipedal Walking for Humanoid Robots in Challenging Environments with Obstacle Avoidance

Marwan Hamze, Mitsuharu Morisawa, Eiichi Yoshida

TL;DR

This paper aims to achieve bipedal locomotion in an environment where obstacles are present using a policy-based reinforcement learning by adding simple distance reward terms to a state of art reward function that can achieve basic bipedal locomotion.

Abstract

Deep reinforcement learning has seen successful implementations on humanoid robots to achieve dynamic walking. However, these implementations have been so far successful in simple environments void of obstacles. In this paper, we aim to achieve bipedal locomotion in an environment where obstacles are present using a policy-based reinforcement learning. By adding simple distance reward terms to a state of art reward function that can achieve basic bipedal locomotion, the trained policy succeeds in navigating the robot towards the desired destination without colliding with the obstacles along the way.

Learning Bipedal Walking for Humanoid Robots in Challenging Environments with Obstacle Avoidance

TL;DR

This paper aims to achieve bipedal locomotion in an environment where obstacles are present using a policy-based reinforcement learning by adding simple distance reward terms to a state of art reward function that can achieve basic bipedal locomotion.

Abstract

Deep reinforcement learning has seen successful implementations on humanoid robots to achieve dynamic walking. However, these implementations have been so far successful in simple environments void of obstacles. In this paper, we aim to achieve bipedal locomotion in an environment where obstacles are present using a policy-based reinforcement learning. By adding simple distance reward terms to a state of art reward function that can achieve basic bipedal locomotion, the trained policy succeeds in navigating the robot towards the desired destination without colliding with the obstacles along the way.

Paper Structure

This paper contains 7 sections, 2 equations, 3 figures.

Figures (3)

  • Figure 1: Overall control structure. The environment information consists of the obstacle and destination reference positions.
  • Figure 2: The simulated environment, highlighting the trajectory taken by the robot to reach the destination in red.
  • Figure 3: Episode Lengths and Returns during the Learning Process