Table of Contents
Fetching ...

Benchmarking Post-Hoc Unknown-Category Detection in Food Recognition

Lubnaa Abdur Rahman, Ioannis Papathanail, Lorenzo Brigato, Stavroula Mougiakakou

TL;DR

The paper tackles the problem of unknown-category detection in fine-grained food recognition within automatic dietary assessment, where misclassifying unseen foods as ID can cascade into downstream errors. It evaluates a broad set of post-hoc OOD detection methods, including MSP, ODIN, OpenMax, KL Matching, Mahalanobis, MaxLog, Energy, ViM, ReAct, and DICE, across CNN and transformer architectures trained on Food-101 and tested on diverse food and non-food OOD datasets after careful overlap removal. ViM emerges as the strongest general approach, leveraging both class logits and feature-space information, while transformer models consistently outperform CNNs in OOD detection. The results underscore that higher ID accuracy improves OOD performance and that non-food OOD data are generally easier to separate, informing deployment considerations for real-world dietary assessment systems.

Abstract

Food recognition models often struggle to distinguish between seen and unseen samples, frequently misclassifying samples from unseen categories by assigning them an in-distribution (ID) label. This misclassification presents significant challenges when deploying these models in real-world applications, particularly within automatic dietary assessment systems, where incorrect labels can lead to cascading errors throughout the system. Ideally, such models should prompt the user when an unknown sample is encountered, allowing for corrective action. Given no prior research exploring food recognition in real-world settings, in this work we conduct an empirical analysis of various post-hoc out-of-distribution (OOD) detection methods for fine-grained food recognition. Our findings indicate that virtual logit matching (ViM) performed the best overall, likely due to its combination of logits and feature-space representations. Additionally, our work reinforces prior notions in the OOD domain, noting that models with higher ID accuracy performed better across the evaluated OOD detection methods. Furthermore, transformer-based architectures consistently outperformed convolution-based models in detecting OOD samples across various methods.

Benchmarking Post-Hoc Unknown-Category Detection in Food Recognition

TL;DR

The paper tackles the problem of unknown-category detection in fine-grained food recognition within automatic dietary assessment, where misclassifying unseen foods as ID can cascade into downstream errors. It evaluates a broad set of post-hoc OOD detection methods, including MSP, ODIN, OpenMax, KL Matching, Mahalanobis, MaxLog, Energy, ViM, ReAct, and DICE, across CNN and transformer architectures trained on Food-101 and tested on diverse food and non-food OOD datasets after careful overlap removal. ViM emerges as the strongest general approach, leveraging both class logits and feature-space information, while transformer models consistently outperform CNNs in OOD detection. The results underscore that higher ID accuracy improves OOD performance and that non-food OOD data are generally easier to separate, informing deployment considerations for real-world dietary assessment systems.

Abstract

Food recognition models often struggle to distinguish between seen and unseen samples, frequently misclassifying samples from unseen categories by assigning them an in-distribution (ID) label. This misclassification presents significant challenges when deploying these models in real-world applications, particularly within automatic dietary assessment systems, where incorrect labels can lead to cascading errors throughout the system. Ideally, such models should prompt the user when an unknown sample is encountered, allowing for corrective action. Given no prior research exploring food recognition in real-world settings, in this work we conduct an empirical analysis of various post-hoc out-of-distribution (OOD) detection methods for fine-grained food recognition. Our findings indicate that virtual logit matching (ViM) performed the best overall, likely due to its combination of logits and feature-space representations. Additionally, our work reinforces prior notions in the OOD domain, noting that models with higher ID accuracy performed better across the evaluated OOD detection methods. Furthermore, transformer-based architectures consistently outperformed convolution-based models in detecting OOD samples across various methods.

Paper Structure

This paper contains 27 sections, 8 equations, 2 figures, 5 tables, 1 algorithm.

Figures (2)

  • Figure 1: Example of images present in ID and OOD datasets
  • Figure 2: Average results for different methods for different models for food and non-foods. The color-coded metrics are as follows: AUROC for foods, AUROC for non-foods, FPR95 Foods, and FPR95 for non-foods.