Upgrading Pepper Robot s Social Interaction with Advanced Hardware and Perception Enhancements
Paolo Magri, Javad Amirian, Mohamed Chetouani
TL;DR
This work addresses Pepper's limited real-time perception for social interaction by integrating on-board GPU and depth sensing. It introduces an end-to-end hardware stack with an NVIDIA Jetson Orin Nano and RealSense D435i mounted on Pepper, running a ROS-based perception pipeline for human detection, FOV estimation, body orientation, and gaze. The authors develop a YOLOv8-pose–driven perception module with 3D keypoint localization via depth fusion, and validate it using a MoCap-based dataset. They provide CAD designs, firmware integration steps, and a practical cost estimate of about €800, illustrating feasible deployment for improved HRI in real-world settings.
Abstract
In this paper, we propose hardware and software enhancements for the Pepper robot to improve its human-robot interaction capabilities. This includes the integration of an NVIDIA Jetson GPU to enhance computational capabilities and execute real time algorithms, and a RealSense D435i camera to capture depth images, as well as the computer vision algorithms to detect and localize the humans around the robot and estimate their body orientation and gaze direction. The new stack is implemented on ROS and is running on the extended Pepper hardware, and the communication with the robot s firmware is done through the NAOqi ROS driver API. We have also collected a MoCap dataset of human activities in a controlled environment, together with the corresponding RGB-D data, to validate the proposed perception algorithms.
