Long-Tailed Continual Learning For Visual Food Recognition
Jiangpeng He, Xiaoyan Zhang, Luotao Lin, Jack Ma, Heather A. Eicher-Miller, Fengqing Zhu
TL;DR
This work tackles long-tailed continual learning for visual food recognition by introducing a unified end-to-end framework that combines feature-based knowledge distillation with a learnable prediction head and CAM-guided CutMix augmentation. It also contributes a new 186-item VFN186 dataset and three population-specific long-tailed benchmarks (VFN186-LT, VFN186-INSULIN, VFN186-T2D) to reflect real-world dietary patterns. Empirical results demonstrate significant gains over existing continual learning methods across multiple LT food datasets, with analyses highlighting the contributions of each component and the practicality of the approach in terms of training efficiency and memory usage. The findings have direct implications for deploying robust, privacy-aware, real-world food recognition systems in diverse populations, guiding future work toward exemplar-free and scalable frameworks.
Abstract
Deep learning-based food recognition has made significant progress in predicting food types from eating occasion images. However, two key challenges hinder real-world deployment: (1) continuously learning new food classes without forgetting previously learned ones, and (2) handling the long-tailed distribution of food images, where a few common classes and many more rare classes. To address these, food recognition methods should focus on long-tailed continual learning. In this work, We introduce a dataset that encompasses 186 American foods along with comprehensive annotations. We also introduce three new benchmark datasets, VFN186-LT, VFN186-INSULIN and VFN186-T2D, which reflect real-world food consumption for healthy populations, insulin takers and individuals with type 2 diabetes without taking insulin. We propose a novel end-to-end framework that improves the generalization ability for instance-rare food classes using a knowledge distillation-based predictor to avoid misalignment of representation during continual learning. Additionally, we introduce an augmentation technique by integrating class-activation-map (CAM) and CutMix to improve generalization on instance-rare food classes. Our method, evaluated on Food101-LT, VFN-LT, VFN186-LT, VFN186-INSULIN, and VFN186-T2DM, shows significant improvements over existing methods. An ablation study highlights further performance enhancements, demonstrating its potential for real-world food recognition applications.
