Table of Contents
Fetching ...

Upgrading Pepper Robot s Social Interaction with Advanced Hardware and Perception Enhancements

Paolo Magri, Javad Amirian, Mohamed Chetouani

TL;DR

This work addresses Pepper's limited real-time perception for social interaction by integrating on-board GPU and depth sensing. It introduces an end-to-end hardware stack with an NVIDIA Jetson Orin Nano and RealSense D435i mounted on Pepper, running a ROS-based perception pipeline for human detection, FOV estimation, body orientation, and gaze. The authors develop a YOLOv8-pose–driven perception module with 3D keypoint localization via depth fusion, and validate it using a MoCap-based dataset. They provide CAD designs, firmware integration steps, and a practical cost estimate of about €800, illustrating feasible deployment for improved HRI in real-world settings.

Abstract

In this paper, we propose hardware and software enhancements for the Pepper robot to improve its human-robot interaction capabilities. This includes the integration of an NVIDIA Jetson GPU to enhance computational capabilities and execute real time algorithms, and a RealSense D435i camera to capture depth images, as well as the computer vision algorithms to detect and localize the humans around the robot and estimate their body orientation and gaze direction. The new stack is implemented on ROS and is running on the extended Pepper hardware, and the communication with the robot s firmware is done through the NAOqi ROS driver API. We have also collected a MoCap dataset of human activities in a controlled environment, together with the corresponding RGB-D data, to validate the proposed perception algorithms.

Upgrading Pepper Robot s Social Interaction with Advanced Hardware and Perception Enhancements

TL;DR

This work addresses Pepper's limited real-time perception for social interaction by integrating on-board GPU and depth sensing. It introduces an end-to-end hardware stack with an NVIDIA Jetson Orin Nano and RealSense D435i mounted on Pepper, running a ROS-based perception pipeline for human detection, FOV estimation, body orientation, and gaze. The authors develop a YOLOv8-pose–driven perception module with 3D keypoint localization via depth fusion, and validate it using a MoCap-based dataset. They provide CAD designs, firmware integration steps, and a practical cost estimate of about €800, illustrating feasible deployment for improved HRI in real-world settings.

Abstract

In this paper, we propose hardware and software enhancements for the Pepper robot to improve its human-robot interaction capabilities. This includes the integration of an NVIDIA Jetson GPU to enhance computational capabilities and execute real time algorithms, and a RealSense D435i camera to capture depth images, as well as the computer vision algorithms to detect and localize the humans around the robot and estimate their body orientation and gaze direction. The new stack is implemented on ROS and is running on the extended Pepper hardware, and the communication with the robot s firmware is done through the NAOqi ROS driver API. We have also collected a MoCap dataset of human activities in a controlled environment, together with the corresponding RGB-D data, to validate the proposed perception algorithms.
Paper Structure (10 sections, 5 equations, 4 figures, 1 table)

This paper contains 10 sections, 5 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: 1) Customized box to house the GPU and the battery separated by a partition. It is equipped with air vents to ensure adequate cooling, essential for maintaining optimal performance and longevity of the GPU. 2) The camera mount is developed sturdy and compact and also thick and short to minimize the vibrations. Due to the degrees of freedom of the Pepper's head, there is no need for more complex mechanisms, and given the fixed geometry of the mount, the camera can be easily calibrated. 3) Back Pepper Robot with New Hardware Installed 4) Front Pepper Robot with New Hardware Installed
  • Figure 2: ROS stack for human detection and FOV estimation
  • Figure 3: Left) Subject facing the camera. Center) Subject with back to the camera. Right) Side view of the 3D skeleton. Legend: Red: $\mathbf{P}_\text{Shoulders}$, Yellow: $\mathbf{P}_\text{Hips}$, Orange: $\mathbf{P}_\text{Pelvis}$, Pink: $\mathbf{P}_\text{Neck}$. Red arrow: $\mathbf{q}_\text{gaze}$, Green arrow: $\mathbf{q}_\text{torso}$.
  • Figure 4: Left) Skeleton generated from MoCap. Middle) RGB image from RealSense with YOLOv8 pose. Right) skeleton with FOV (red) and body direction (green) generated by algorithm