EXOT: Exit-aware Object Tracker for Safe Robotic Manipulation of Moving Object
Hyunseo Kim, Hye Jung Yoon, Minji Kim, Dong-Sig Han, Byoung-Tak Zhang
TL;DR
EXOT addresses safe robotic manipulation from a hand-mounted camera by coupling a long-term transformer-based tracker with an out-of-distribution (OOD) classifier to detect the absence of the target and trigger conservative actions. The method builds on STARK with three heads—bounding box prediction, template update score, and OOD score—using Generalized ODIN principles to compute $p(y, d_{in}|x)$ and $p(d_{in}|x)$, and applies time-smoothed thresholds for exit decisions. A new RMOT-223 dataset is introduced, alongside comprehensive experiments on TREK-150, ablations, and a real UR5e sushi task, showing up to 38% improvement in exit-awareness over STARK. The work demonstrates practical safety benefits for first-person robotics in dynamic environments and offers a framework adaptable to other robotic domains, albeit with dataset-specific threshold considerations.
Abstract
Current robotic hand manipulation narrowly operates with objects in predictable positions in limited environments. Thus, when the location of the target object deviates severely from the expected location, a robot sometimes responds in an unexpected way, especially when it operates with a human. For safe robot operation, we propose the EXit-aware Object Tracker (EXOT) on a robot hand camera that recognizes an object's absence during manipulation. The robot decides whether to proceed by examining the tracker's bounding box output containing the target object. We adopt an out-of-distribution classifier for more accurate object recognition since trackers can mistrack a background as a target object. To the best of our knowledge, our method is the first approach of applying an out-of-distribution classification technique to a tracker output. We evaluate our method on the first-person video benchmark dataset, TREK-150, and on the custom dataset, RMOT-223, that we collect from the UR5e robot. Then we test our tracker on the UR5e robot in real-time with a conveyor-belt sushi task, to examine the tracker's ability to track target dishes and to determine the exit status. Our tracker shows 38% higher exit-aware performance than a baseline method. The dataset and the code will be released at https://github.com/hskAlena/EXOT.
