Table of Contents
Fetching ...

From Canteen Food to Daily Meals: Generalizing Food Recognition to More Practical Scenarios

Guoshan Liu, Yang Jiao, Jingjing Chen, Bin Zhu, Yu-Gang Jiang

TL;DR

The paper tackles the challenge of transferring food-recognition models trained on canteen-style datasets to daily-life images by introducing DailyFood-172 and DailyFood-16 as realistic benchmarks. It presents Multi-Cluster Reference Learning (MCRL), a simple yet effective baseline that aligns target samples with multiple source clusters via top-K pseudo labels, with hard and soft selection variants. Empirical results show that integrating MCRL with state-of-the-art UDA methods yields consistent improvements across target datasets and backbones, and ablations confirm the value of multi-cluster and weighted alignment. The work advances practical food recognition by emphasizing cross-domain generalization and providing benchmarks and techniques to bridge the gap between curated datasets and real-world usage.

Abstract

The precise recognition of food categories plays a pivotal role for intelligent health management, attracting significant research attention in recent years. Prominent benchmarks, such as Food-101 and VIREO Food-172, provide abundant food image resources that catalyze the prosperity of research in this field. Nevertheless, these datasets are well-curated from canteen scenarios and thus deviate from food appearances in daily life. This discrepancy poses great challenges in effectively transferring classifiers trained on these canteen datasets to broader daily-life scenarios encountered by humans. Toward this end, we present two new benchmarks, namely DailyFood-172 and DailyFood-16, specifically designed to curate food images from everyday meals. These two datasets are used to evaluate the transferability of approaches from the well-curated food image domain to the everyday-life food image domain. In addition, we also propose a simple yet effective baseline method named Multi-Cluster Reference Learning (MCRL) to tackle the aforementioned domain gap. MCRL is motivated by the observation that food images in daily-life scenarios exhibit greater intra-class appearance variance compared with those in well-curated benchmarks. Notably, MCRL can be seamlessly coupled with existing approaches, yielding non-trivial performance enhancements. We hope our new benchmarks can inspire the community to explore the transferability of food recognition models trained on well-curated datasets toward practical real-life applications.

From Canteen Food to Daily Meals: Generalizing Food Recognition to More Practical Scenarios

TL;DR

The paper tackles the challenge of transferring food-recognition models trained on canteen-style datasets to daily-life images by introducing DailyFood-172 and DailyFood-16 as realistic benchmarks. It presents Multi-Cluster Reference Learning (MCRL), a simple yet effective baseline that aligns target samples with multiple source clusters via top-K pseudo labels, with hard and soft selection variants. Empirical results show that integrating MCRL with state-of-the-art UDA methods yields consistent improvements across target datasets and backbones, and ablations confirm the value of multi-cluster and weighted alignment. The work advances practical food recognition by emphasizing cross-domain generalization and providing benchmarks and techniques to bridge the gap between curated datasets and real-world usage.

Abstract

The precise recognition of food categories plays a pivotal role for intelligent health management, attracting significant research attention in recent years. Prominent benchmarks, such as Food-101 and VIREO Food-172, provide abundant food image resources that catalyze the prosperity of research in this field. Nevertheless, these datasets are well-curated from canteen scenarios and thus deviate from food appearances in daily life. This discrepancy poses great challenges in effectively transferring classifiers trained on these canteen datasets to broader daily-life scenarios encountered by humans. Toward this end, we present two new benchmarks, namely DailyFood-172 and DailyFood-16, specifically designed to curate food images from everyday meals. These two datasets are used to evaluate the transferability of approaches from the well-curated food image domain to the everyday-life food image domain. In addition, we also propose a simple yet effective baseline method named Multi-Cluster Reference Learning (MCRL) to tackle the aforementioned domain gap. MCRL is motivated by the observation that food images in daily-life scenarios exhibit greater intra-class appearance variance compared with those in well-curated benchmarks. Notably, MCRL can be seamlessly coupled with existing approaches, yielding non-trivial performance enhancements. We hope our new benchmarks can inspire the community to explore the transferability of food recognition models trained on well-curated datasets toward practical real-life applications.
Paper Structure (20 sections, 9 equations, 9 figures, 4 tables)

This paper contains 20 sections, 9 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Variations in visual appearance of "Braised Tofu" from VIREO Food-172 and daily meals. The first row shows three examples of dishes in VIREO Food-172, followed by examples from daily meals in the second row.
  • Figure 2: Examples of food categories in DailyFood-172.
  • Figure 3: Examples of food categories in DailyFood-16.
  • Figure 4: Pie distribution chart of categories in DailyFood-16.
  • Figure 6: The architecture of the proposed Multi-Cluster Reference Learning. It consists of a feature extractor $g_\theta$, a classifier $f_\theta$, as well as two multi-cluster reference learning objectives. Within the two learning objectives, the distribution gap between target samples and multiple categories of source clusters is narrowed for pursuing better generalization ability.
  • ...and 4 more figures