AI-Driven Relocation Tracking in Dynamic Kitchen Environments
Arash Nasr Esfahani, Hamed Hosseini, Mehdi Tale Masouleh, Ahmad Kalhor, Hedieh Sajedi
TL;DR
The paper addresses the problem of tracking object relocations in dynamic kitchen environments by combining a frame-scoring relocation framework with YOLOv5s detection in the AI2-THOR simulation. It develops a comprehensive pipeline: synthetic data generation and annotation in AI2-THOR, training a robust detector, and a best-associated frame selection algorithm guided by a $Visibility Score$ to identify changes between pre-change and post-change scenes along a fixed route. The dataset comprises over 9,000 images across 69 object classes (including 29 state variants), with 300-epoch training of YOLOv5s achieving a refined recall of $75.5\%$ and mAP50 of $81.6\%$, while precision slightly decreased to $84.8\%$. The relocation tracker demonstrates high performance, with $Precision=95.8\%$, $Recall=96.8\%$, and $Accuracy=97.7\%$, enabling robust change detection in kitchen environments and suggesting future work to generalize to entire homes.
Abstract
As smart homes become more prevalent in daily life, the ability to understand dynamic environments is essential which is increasingly dependent on AI systems. This study focuses on developing an intelligent algorithm which can navigate a robot through a kitchen, recognizing objects, and tracking their relocation. The kitchen was chosen as the testing ground due to its dynamic nature as objects are frequently moved, rearranged and replaced. Various techniques, such as SLAM feature-based tracking and deep learning-based object detection (e.g., Faster R-CNN), are commonly used for object tracking. Additionally, methods such as optical flow analysis and 3D reconstruction have also been used to track the relocation of objects. These approaches often face challenges when it comes to problems such as lighting variations and partial occlusions, where parts of the object are hidden in some frames but visible in others. The proposed method in this study leverages the YOLOv5 architecture, initialized with pre-trained weights and subsequently fine-tuned on a custom dataset. A novel method was developed, introducing a frame-scoring algorithm which calculates a score for each object based on its location and features within all frames. This scoring approach helps to identify changes by determining the best-associated frame for each object and comparing the results in each scene, overcoming limitations seen in other methods while maintaining simplicity in design. The experimental results demonstrate an accuracy of 97.72%, a precision of 95.83% and a recall of 96.84% for this algorithm, which highlights the efficacy of the model in detecting spatial changes.
