Table of Contents
Fetching ...

AI-Driven Relocation Tracking in Dynamic Kitchen Environments

Arash Nasr Esfahani, Hamed Hosseini, Mehdi Tale Masouleh, Ahmad Kalhor, Hedieh Sajedi

TL;DR

The paper addresses the problem of tracking object relocations in dynamic kitchen environments by combining a frame-scoring relocation framework with YOLOv5s detection in the AI2-THOR simulation. It develops a comprehensive pipeline: synthetic data generation and annotation in AI2-THOR, training a robust detector, and a best-associated frame selection algorithm guided by a $Visibility Score$ to identify changes between pre-change and post-change scenes along a fixed route. The dataset comprises over 9,000 images across 69 object classes (including 29 state variants), with 300-epoch training of YOLOv5s achieving a refined recall of $75.5\%$ and mAP50 of $81.6\%$, while precision slightly decreased to $84.8\%$. The relocation tracker demonstrates high performance, with $Precision=95.8\%$, $Recall=96.8\%$, and $Accuracy=97.7\%$, enabling robust change detection in kitchen environments and suggesting future work to generalize to entire homes.

Abstract

As smart homes become more prevalent in daily life, the ability to understand dynamic environments is essential which is increasingly dependent on AI systems. This study focuses on developing an intelligent algorithm which can navigate a robot through a kitchen, recognizing objects, and tracking their relocation. The kitchen was chosen as the testing ground due to its dynamic nature as objects are frequently moved, rearranged and replaced. Various techniques, such as SLAM feature-based tracking and deep learning-based object detection (e.g., Faster R-CNN), are commonly used for object tracking. Additionally, methods such as optical flow analysis and 3D reconstruction have also been used to track the relocation of objects. These approaches often face challenges when it comes to problems such as lighting variations and partial occlusions, where parts of the object are hidden in some frames but visible in others. The proposed method in this study leverages the YOLOv5 architecture, initialized with pre-trained weights and subsequently fine-tuned on a custom dataset. A novel method was developed, introducing a frame-scoring algorithm which calculates a score for each object based on its location and features within all frames. This scoring approach helps to identify changes by determining the best-associated frame for each object and comparing the results in each scene, overcoming limitations seen in other methods while maintaining simplicity in design. The experimental results demonstrate an accuracy of 97.72%, a precision of 95.83% and a recall of 96.84% for this algorithm, which highlights the efficacy of the model in detecting spatial changes.

AI-Driven Relocation Tracking in Dynamic Kitchen Environments

TL;DR

The paper addresses the problem of tracking object relocations in dynamic kitchen environments by combining a frame-scoring relocation framework with YOLOv5s detection in the AI2-THOR simulation. It develops a comprehensive pipeline: synthetic data generation and annotation in AI2-THOR, training a robust detector, and a best-associated frame selection algorithm guided by a to identify changes between pre-change and post-change scenes along a fixed route. The dataset comprises over 9,000 images across 69 object classes (including 29 state variants), with 300-epoch training of YOLOv5s achieving a refined recall of and mAP50 of , while precision slightly decreased to . The relocation tracker demonstrates high performance, with , , and , enabling robust change detection in kitchen environments and suggesting future work to generalize to entire homes.

Abstract

As smart homes become more prevalent in daily life, the ability to understand dynamic environments is essential which is increasingly dependent on AI systems. This study focuses on developing an intelligent algorithm which can navigate a robot through a kitchen, recognizing objects, and tracking their relocation. The kitchen was chosen as the testing ground due to its dynamic nature as objects are frequently moved, rearranged and replaced. Various techniques, such as SLAM feature-based tracking and deep learning-based object detection (e.g., Faster R-CNN), are commonly used for object tracking. Additionally, methods such as optical flow analysis and 3D reconstruction have also been used to track the relocation of objects. These approaches often face challenges when it comes to problems such as lighting variations and partial occlusions, where parts of the object are hidden in some frames but visible in others. The proposed method in this study leverages the YOLOv5 architecture, initialized with pre-trained weights and subsequently fine-tuned on a custom dataset. A novel method was developed, introducing a frame-scoring algorithm which calculates a score for each object based on its location and features within all frames. This scoring approach helps to identify changes by determining the best-associated frame for each object and comparing the results in each scene, overcoming limitations seen in other methods while maintaining simplicity in design. The experimental results demonstrate an accuracy of 97.72%, a precision of 95.83% and a recall of 96.84% for this algorithm, which highlights the efficacy of the model in detecting spatial changes.

Paper Structure

This paper contains 12 sections, 1 equation, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Top view of the kitchen environment, showing the agent's route and the positions of various objects. (Note: This angle was not used during the training or evaluation phases of the project.)
  • Figure 2: Variability of object properties in the dataset collected using the AI2-THOR simulator.
  • Figure 3: Schematic overview of the proposed methodology. The system computes a score for each object within a frame to identify the best-associated frame. In the relocation tracking phase, green squares indicate no change in object position, black squares indicate a change in position, and red squares indicate a change in object position, highlighting the current frame as the best-associated frame for the relocated object.
  • Figure 4: Best-associated frame selection algorithm results of objects in 5 different randomized scenes. The object detection and relocation tracking results are presented separately. Check marks (✓), hyphens (-), and cross marks (✗) represent successful detections, missed detections, and unsuccessful detections, respectively.