Table of Contents
Fetching ...

Automatic Recognition of Food Ingestion Environment from the AIM-2 Wearable Sensor

Yuning Huang, Mohamed Abul Hassan, Jiangpeng He, Janine Higgins, Megan McCrory, Heather Eicher-Miller, Graham Thomas, Edward O Sazonov, Fengqing Maggie Zhu

TL;DR

This work addresses automatic ingestion-environment recognition using the AIM-2 egocentric wearable. It introduces a two-stage drop-then-maintain training framework that combines transfer learning and finetuning, leveraging a semantic-filtered Place365-ours dataset to combat data imbalance and perceptual aliasing in egocentric scenes. Evaluated on the UA Free Living Study, the method achieves an overall accuracy of $96.63\%$, with clear gains for minority classes and across multiple backbone architectures, demonstrating robustness to domain shift and limited data. The approach offers a practical pathway for dietary assessment research by enabling scalable, automatic interpretation of ingestion contexts in naturalistic settings.

Abstract

Detecting an ingestion environment is an important aspect of monitoring dietary intake. It provides insightful information for dietary assessment. However, it is a challenging problem where human-based reviewing can be tedious, and algorithm-based review suffers from data imbalance and perceptual aliasing problems. To address these issues, we propose a neural network-based method with a two-stage training framework that tactfully combines fine-tuning and transfer learning techniques. Our method is evaluated on a newly collected dataset called ``UA Free Living Study", which uses an egocentric wearable camera, AIM-2 sensor, to simulate food consumption in free-living conditions. The proposed training framework is applied to common neural network backbones, combined with approaches in the general imbalanced classification field. Experimental results on the collected dataset show that our proposed method for automatic ingestion environment recognition successfully addresses the challenging data imbalance problem in the dataset and achieves a promising overall classification accuracy of 96.63%.

Automatic Recognition of Food Ingestion Environment from the AIM-2 Wearable Sensor

TL;DR

This work addresses automatic ingestion-environment recognition using the AIM-2 egocentric wearable. It introduces a two-stage drop-then-maintain training framework that combines transfer learning and finetuning, leveraging a semantic-filtered Place365-ours dataset to combat data imbalance and perceptual aliasing in egocentric scenes. Evaluated on the UA Free Living Study, the method achieves an overall accuracy of , with clear gains for minority classes and across multiple backbone architectures, demonstrating robustness to domain shift and limited data. The approach offers a practical pathway for dietary assessment research by enabling scalable, automatic interpretation of ingestion contexts in naturalistic settings.

Abstract

Detecting an ingestion environment is an important aspect of monitoring dietary intake. It provides insightful information for dietary assessment. However, it is a challenging problem where human-based reviewing can be tedious, and algorithm-based review suffers from data imbalance and perceptual aliasing problems. To address these issues, we propose a neural network-based method with a two-stage training framework that tactfully combines fine-tuning and transfer learning techniques. Our method is evaluated on a newly collected dataset called ``UA Free Living Study", which uses an egocentric wearable camera, AIM-2 sensor, to simulate food consumption in free-living conditions. The proposed training framework is applied to common neural network backbones, combined with approaches in the general imbalanced classification field. Experimental results on the collected dataset show that our proposed method for automatic ingestion environment recognition successfully addresses the challenging data imbalance problem in the dataset and achieves a promising overall classification accuracy of 96.63%.
Paper Structure (18 sections, 4 equations, 6 figures, 6 tables)

This paper contains 18 sections, 4 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Compilation of images showcasing various environments where food may be consumed. The montage is created using free-living data collected from AIM-2.
  • Figure 2: AIM-2, an egocentric wearable camera that monitors ingestion behavior.
  • Figure 3: Description of the proposed method for automatic ingestion environment recognition.
  • Figure 4: Two-stage drop-then-maintain training framework. After the ImageNet pretraining, the feature classifier $g_1$ is dropped and replaced by $g_2$ (since the number of classes has changed from 1,000 to 4), and $g_2$ can remain from stage 1 to stage 2 because of the semantic-based class filtering and merging we performed to Place365 dataset.
  • Figure 5: Semantic-based dataset filtering and merging.
  • ...and 1 more figures