From Canteen Food to Daily Meals: Generalizing Food Recognition to More Practical Scenarios
Guoshan Liu, Yang Jiao, Jingjing Chen, Bin Zhu, Yu-Gang Jiang
TL;DR
The paper tackles the challenge of transferring food-recognition models trained on canteen-style datasets to daily-life images by introducing DailyFood-172 and DailyFood-16 as realistic benchmarks. It presents Multi-Cluster Reference Learning (MCRL), a simple yet effective baseline that aligns target samples with multiple source clusters via top-K pseudo labels, with hard and soft selection variants. Empirical results show that integrating MCRL with state-of-the-art UDA methods yields consistent improvements across target datasets and backbones, and ablations confirm the value of multi-cluster and weighted alignment. The work advances practical food recognition by emphasizing cross-domain generalization and providing benchmarks and techniques to bridge the gap between curated datasets and real-world usage.
Abstract
The precise recognition of food categories plays a pivotal role for intelligent health management, attracting significant research attention in recent years. Prominent benchmarks, such as Food-101 and VIREO Food-172, provide abundant food image resources that catalyze the prosperity of research in this field. Nevertheless, these datasets are well-curated from canteen scenarios and thus deviate from food appearances in daily life. This discrepancy poses great challenges in effectively transferring classifiers trained on these canteen datasets to broader daily-life scenarios encountered by humans. Toward this end, we present two new benchmarks, namely DailyFood-172 and DailyFood-16, specifically designed to curate food images from everyday meals. These two datasets are used to evaluate the transferability of approaches from the well-curated food image domain to the everyday-life food image domain. In addition, we also propose a simple yet effective baseline method named Multi-Cluster Reference Learning (MCRL) to tackle the aforementioned domain gap. MCRL is motivated by the observation that food images in daily-life scenarios exhibit greater intra-class appearance variance compared with those in well-curated benchmarks. Notably, MCRL can be seamlessly coupled with existing approaches, yielding non-trivial performance enhancements. We hope our new benchmarks can inspire the community to explore the transferability of food recognition models trained on well-curated datasets toward practical real-life applications.
