Table of Contents
Fetching ...

Dual-Imbalance Continual Learning for Real-World Food Recognition

Xiaoyan Zhang, Jiangpeng He

Abstract

Visual food recognition in real-world dietary logging scenarios naturally exhibits severe data imbalance, where a small number of food categories appear frequently while many others occur rarely, resulting in long-tailed class distributions. In practice, food recognition systems often operate in a continual learning setting, where new categories are introduced sequentially over time. However, existing studies typically assume that each incremental step introduces a similar number of new food classes, which rarely happens in real world where the number of newly observed categories can vary significantly across steps, leading to highly uneven learning dynamics. As a result, continual food recognition exhibits a dual imbalance: imbalanced samples within each food class and imbalanced numbers of new food classes to learn at each incremental learning step. In this work, we introduce DIME, a Dual-Imbalance-aware Adapter Merging framework for continual food recognition. DIME learns lightweight adapters for each task using parameter-efficient fine-tuning and progressively integrates them through a class-count guided spectral merging strategy. A rank-wise threshold modulation mechanism further stabilizes the merging process by preserving dominant knowledge while allowing adaptive updates. The resulting model maintains a single merged adapter for inference, enabling efficient deployment without accumulating task-specific modules. Experiments on realistic long-tailed food benchmarks under our step-imbalanced setup show that the proposed method consistently improves by more than 3% over the strongest existing continual learning baselines. Code is available at https://github.com/xiaoyanzhang1/DIME.

Dual-Imbalance Continual Learning for Real-World Food Recognition

Abstract

Visual food recognition in real-world dietary logging scenarios naturally exhibits severe data imbalance, where a small number of food categories appear frequently while many others occur rarely, resulting in long-tailed class distributions. In practice, food recognition systems often operate in a continual learning setting, where new categories are introduced sequentially over time. However, existing studies typically assume that each incremental step introduces a similar number of new food classes, which rarely happens in real world where the number of newly observed categories can vary significantly across steps, leading to highly uneven learning dynamics. As a result, continual food recognition exhibits a dual imbalance: imbalanced samples within each food class and imbalanced numbers of new food classes to learn at each incremental learning step. In this work, we introduce DIME, a Dual-Imbalance-aware Adapter Merging framework for continual food recognition. DIME learns lightweight adapters for each task using parameter-efficient fine-tuning and progressively integrates them through a class-count guided spectral merging strategy. A rank-wise threshold modulation mechanism further stabilizes the merging process by preserving dominant knowledge while allowing adaptive updates. The resulting model maintains a single merged adapter for inference, enabling efficient deployment without accumulating task-specific modules. Experiments on realistic long-tailed food benchmarks under our step-imbalanced setup show that the proposed method consistently improves by more than 3% over the strongest existing continual learning baselines. Code is available at https://github.com/xiaoyanzhang1/DIME.

Paper Structure

This paper contains 29 sections, 11 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Illustration of the dual-imbalanced continual food recognition scenario. Tasks contain different numbers of classes (step imbalance), and each class has varying numbers of samples (long-tailed class distribution liu2022longtailedclassincrementallearning).
  • Figure 2: Overview of DIME. (a) Sequential food datasets are learned with lightweight adapters on a frozen backbone using Balanced Softmax. After each step, the new adapter is merged with the accumulated base adapter to produce a unified model. (b) The merging module performs spectral alignment via SVD, class-count guided weighting, and rank-wise threshold modulation.
  • Figure 3: Performance comparison under different step-imbalance ratios $\rho$ on VFN186-LT and Food101-LT with $T=10$ incremental steps. Our method consistently achieves higher and more stable performance across tasks under all imbalance settings.
  • Figure 4: Sensitivity analysis of rank-wise threshold parameters.