Dual-LoRA and Quality-Enhanced Pseudo Replay for Multimodal Continual Food Learning
Xinlan Wu, Bin Zhu, Feng Han, Pengkun Jiao, Jingjing Chen
TL;DR
The paper tackles catastrophic forgetting in multimodal food analysis models by introducing Dual-LoRA, a dual-adapter framework with a task-specific specialized LoRA and a shared knowledge-consolidating cooperative LoRA, augmented by orthogonal regularization. It pairs this with Quality-Enhanced Pseudo Replay, which uses self-consistency and semantic similarity to generate reliable past-task data for rehearsal, mitigating hallucinations typical in generative models. The method is instantiated on the Uni-Food dataset and demonstrates superior forgetting mitigation across ingredient recognition, recipe generation, and nutrition estimation, supported by extensive ablations showing the contribution of each component. This approach reduces retraining costs and provides a scalable paradigm for lifelong learning in complex multimodal food analytics, with implications for personalized nutrition and health applications. Key mathematical components include the orthogonality constraint $L_o(A_t) = \sum_{i,j} \| O_t[i,j] \|^2$ with $O_t = A_{t,\text{specialized}} \cdot A_{t-1,\text{cooperative}}$, and the supervised objective $\sum_{(x,y)\in \mathcal{D}_t} \log p_{\Theta}(y|x) + \lambda_o L_o(A_t)$. Pseudo replay leverages $n$ generated samples per task (e.g., $n=5$) with quality-enhancement steps to improve reliability of past-task data for cooperative LoRA training.
Abstract
Food analysis has become increasingly critical for health-related tasks such as personalized nutrition and chronic disease prevention. However, existing large multimodal models (LMMs) in food analysis suffer from catastrophic forgetting when learning new tasks, requiring costly retraining from scratch. To address this, we propose a novel continual learning framework for multimodal food learning, integrating a Dual-LoRA architecture with Quality-Enhanced Pseudo Replay. We introduce two complementary low-rank adapters for each task: a specialized LoRA that learns task-specific knowledge with orthogonal constraints to previous tasks' subspaces, and a cooperative LoRA that consolidates shared knowledge across tasks via pseudo replay. To improve the reliability of replay data, our Quality-Enhanced Pseudo Replay strategy leverages self-consistency and semantic similarity to reduce hallucinations in generated samples. Experiments on the comprehensive Uni-Food dataset show superior performance in mitigating forgetting, representing the first effective continual learning approach for complex food tasks.
