CARPE-ID: Continuously Adaptable Re-identification for Personalized Robot Assistance
Federico Rollo, Andrea Zunino, Nikolaos Tsagarakis, Enrico Mingo Hoffman, Arash Ajoudani
TL;DR
The paper tackles the problem of persistent, personalized target re-identification in crowded human-robot interaction settings, where appearance changes and occlusions challenge standard MOT/SOT methods. It proposes CARPE-ID, a continual adaptation framework that couples MOT-based detections with an online re-identification module and a damped EMA–driven update of the ideal target representation and threshold, plus a distractor blacklist for robustness. Key contributions include online target representation updates, dual-damping control for adaptation and stability, and validation in both a lab dataset and a real FollowMe HRI scenario, demonstrating robustness to occlusions and outfit changes with limited re-identification delays. The approach promises practical impact for persistent, person-specific assistance in real-world robotics by reducing re-initialization needs and increasing tracking reliability in dynamic environments.
Abstract
In today's Human-Robot Interaction (HRI) scenarios, a prevailing tendency exists to assume that the robot shall cooperate with the closest individual or that the scene involves merely a singular human actor. However, in realistic scenarios, such as shop floor operations, such an assumption may not hold and personalized target recognition by the robot in crowded environments is required. To fulfil this requirement, in this work, we propose a person re-identification module based on continual visual adaptation techniques that ensure the robot's seamless cooperation with the appropriate individual even subject to varying visual appearances or partial or complete occlusions. We test the framework singularly using recorded videos in a laboratory environment and an HRI scenario, i.e., a person-following task by a mobile robot. The targets are asked to change their appearance during tracking and to disappear from the camera field of view to test the challenging cases of occlusion and outfit variations. We compare our framework with one of the state-of-the-art Multi-Object Tracking (MOT) methods and the results show that the CARPE-ID can accurately track each selected target throughout the experiments in all the cases (except two limit cases). At the same time, the s-o-t-a MOT has a mean of 4 tracking errors for each video.
