Person Re-Identification for Robot Person Following with Online Continual Learning
Hanjing Ye, Jieting Zhao, Yu Zhan, Weinan Chen, Li He, Hong Zhang
TL;DR
This work tackles robot person following under occlusion by reframing target re-identification as an online continual learning (OCL) problem. A ReID module uses a memory manager to fuse short-term and long-term experiences, continually updating a ResNet-based feature extractor and a ridge-regression classifier to maintain discriminative appearance representations. Memory-guided replay and loss-based keyframe selection mitigate domain drift, enabling reliable re-identification of the target even with appearance changes and distracting distractors, while remaining feasible for onboard, real-time operation. Experiments on public and custom datasets show state-of-the-art ReID performance in RPF and demonstrate the practical benefits of online learning and memory consolidation for robust, persistent following.
Abstract
Robot person following (RPF) is a crucial capability in human-robot interaction (HRI) applications, allowing a robot to persistently follow a designated person. In practical RPF scenarios, the person can often be occluded by other objects or people. Consequently, it is necessary to re-identify the person when he/she reappears within the robot's field of view. Previous person re-identification (ReID) approaches to person following rely on a fixed feature extractor. Such an approach often fails to generalize to different viewpoints and lighting conditions in practical RPF environments. In other words, it suffers from the so-called domain shift problem where it cannot re-identify the person when his re-appearance is out of the domain modeled by the fixed feature extractor. To mitigate this problem, we propose a ReID framework for RPF where we use a feature extractor that is optimized online with both short-term and long-term experiences (i.e., recently and previously observed samples during RPF) using the online continual learning (OCL) framework. The long-term experiences are maintained by a memory manager to enable OCL to update the feature extractor. Our experiments demonstrate that even in the presence of severe appearance changes and distractions from visually similar people, the proposed method can still re-identify the person more accurately than the state-of-the-art methods.
