Table of Contents
Fetching ...

CARPE-ID: Continuously Adaptable Re-identification for Personalized Robot Assistance

Federico Rollo, Andrea Zunino, Nikolaos Tsagarakis, Enrico Mingo Hoffman, Arash Ajoudani

TL;DR

The paper tackles the problem of persistent, personalized target re-identification in crowded human-robot interaction settings, where appearance changes and occlusions challenge standard MOT/SOT methods. It proposes CARPE-ID, a continual adaptation framework that couples MOT-based detections with an online re-identification module and a damped EMA–driven update of the ideal target representation and threshold, plus a distractor blacklist for robustness. Key contributions include online target representation updates, dual-damping control for adaptation and stability, and validation in both a lab dataset and a real FollowMe HRI scenario, demonstrating robustness to occlusions and outfit changes with limited re-identification delays. The approach promises practical impact for persistent, person-specific assistance in real-world robotics by reducing re-initialization needs and increasing tracking reliability in dynamic environments.

Abstract

In today's Human-Robot Interaction (HRI) scenarios, a prevailing tendency exists to assume that the robot shall cooperate with the closest individual or that the scene involves merely a singular human actor. However, in realistic scenarios, such as shop floor operations, such an assumption may not hold and personalized target recognition by the robot in crowded environments is required. To fulfil this requirement, in this work, we propose a person re-identification module based on continual visual adaptation techniques that ensure the robot's seamless cooperation with the appropriate individual even subject to varying visual appearances or partial or complete occlusions. We test the framework singularly using recorded videos in a laboratory environment and an HRI scenario, i.e., a person-following task by a mobile robot. The targets are asked to change their appearance during tracking and to disappear from the camera field of view to test the challenging cases of occlusion and outfit variations. We compare our framework with one of the state-of-the-art Multi-Object Tracking (MOT) methods and the results show that the CARPE-ID can accurately track each selected target throughout the experiments in all the cases (except two limit cases). At the same time, the s-o-t-a MOT has a mean of 4 tracking errors for each video.

CARPE-ID: Continuously Adaptable Re-identification for Personalized Robot Assistance

TL;DR

The paper tackles the problem of persistent, personalized target re-identification in crowded human-robot interaction settings, where appearance changes and occlusions challenge standard MOT/SOT methods. It proposes CARPE-ID, a continual adaptation framework that couples MOT-based detections with an online re-identification module and a damped EMA–driven update of the ideal target representation and threshold, plus a distractor blacklist for robustness. Key contributions include online target representation updates, dual-damping control for adaptation and stability, and validation in both a lab dataset and a real FollowMe HRI scenario, demonstrating robustness to occlusions and outfit changes with limited re-identification delays. The approach promises practical impact for persistent, person-specific assistance in real-world robotics by reducing re-initialization needs and increasing tracking reliability in dynamic environments.

Abstract

In today's Human-Robot Interaction (HRI) scenarios, a prevailing tendency exists to assume that the robot shall cooperate with the closest individual or that the scene involves merely a singular human actor. However, in realistic scenarios, such as shop floor operations, such an assumption may not hold and personalized target recognition by the robot in crowded environments is required. To fulfil this requirement, in this work, we propose a person re-identification module based on continual visual adaptation techniques that ensure the robot's seamless cooperation with the appropriate individual even subject to varying visual appearances or partial or complete occlusions. We test the framework singularly using recorded videos in a laboratory environment and an HRI scenario, i.e., a person-following task by a mobile robot. The targets are asked to change their appearance during tracking and to disappear from the camera field of view to test the challenging cases of occlusion and outfit variations. We compare our framework with one of the state-of-the-art Multi-Object Tracking (MOT) methods and the results show that the CARPE-ID can accurately track each selected target throughout the experiments in all the cases (except two limit cases). At the same time, the s-o-t-a MOT has a mean of 4 tracking errors for each video.
Paper Structure (10 sections, 5 equations, 4 figures, 1 algorithm)

This paper contains 10 sections, 5 equations, 4 figures, 1 algorithm.

Figures (4)

  • Figure 1: The figure shows the framework pipeline starting from the image input to the target re-identification output. The first module is MOT, where a neural network gives the first coarse tracking of objects in the image. From the MOT module, we obtain the detections that are passed into a feature extractor outputting the output feature vectors $\boldsymbol{x_i}$ where $\boldsymbol{i}$ goes from zero to the number of the input detections. The feature vectors $\boldsymbol{x_i}$ permits to compute the statistical distance $d_{\boldsymbol{\mu},\boldsymbol{\sigma}}(\mathbf{x})$ (see Eq. \ref{['eq:feat_dist']}) that will be used by the re-identification module. If the MOT can correctly track the target, we proceed directly to use its output, otherwise, if no target is found by the MOT, our re-identification module searches for a correspondence between the ideal target and the ones in the image. If the target keeps the same ID or if it is re-identified, the target statistical distance $d_{\boldsymbol{\mu},\boldsymbol{\sigma}}(\mathbf{x})$ and the target features $\boldsymbol{x_i}$ are used to update respectively the re-identifier threshold $\lambda_d$ and the ideal target representation $\boldsymbol{\mu}$ and $\boldsymbol{\sigma}$ (dashed lines). Otherwise, if no target is re-identified, the framework does not give any output and skips to the next RGB frame.
  • Figure 2: Comparison between DEMA and EMA effects on threshold filtering and ideal target representation. In subfigure (a) the distances mean, variance, and threshold (in dark blue, green, and red) closely follow the behaviour of the distances plot (light blue line) with a small delay. In subfigure (b) the damping factor ensures that the mean, variance, and threshold do not follow the behaviour of the distance during big peaks and wait for the ideal representation to adapt to the new appearances of the target.
  • Figure 3: Statistical evaluation of the obtained results.
  • Figure 4: A FollowMe experiment sample: the target person has to roughly follow an ideal path (red dashed line) while the robot (blue line) has to follow him/her. The target positions computed using the CARPE-ID framework tracking are represented with green plus signs. The robot is initially placed in the green start position and has to follow the target until the red finish position is reached.