Low-Rank Mixture-of-Experts for Continual Medical Image Segmentation

Qian Chen; Lei Zhu; Hangzhou He; Xinliang Zhang; Shuang Zeng; Qiushi Ren; Yanye Lu

Low-Rank Mixture-of-Experts for Continual Medical Image Segmentation

Qian Chen, Lei Zhu, Hangzhou He, Xinliang Zhang, Shuang Zeng, Qiushi Ren, Yanye Lu

TL;DR

A network is proposed by introducing the data-specific Mixture of Experts (MoE) structure to handle the new tasks or categories, ensuring that the network parameters of previous tasks are unaffected or only minimally impacted.

Abstract

The primary goal of continual learning (CL) task in medical image segmentation field is to solve the "catastrophic forgetting" problem, where the model totally forgets previously learned features when it is extended to new categories (class-level) or tasks (task-level). Due to the privacy protection, the historical data labels are inaccessible. Prevalent continual learning methods primarily focus on generating pseudo-labels for old datasets to force the model to memorize the learned features. However, the incorrect pseudo-labels may corrupt the learned feature and lead to a new problem that the better the model is trained on the old task, the poorer the model performs on the new tasks. To avoid this problem, we propose a network by introducing the data-specific Mixture of Experts (MoE) structure to handle the new tasks or categories, ensuring that the network parameters of previous tasks are unaffected or only minimally impacted. To further overcome the tremendous memory costs caused by introducing additional structures, we propose a Low-Rank strategy which significantly reduces memory cost. We validate our method on both class-level and task-level continual learning challenges. Extensive experiments on multiple datasets show our model outperforms all other methods.

Low-Rank Mixture-of-Experts for Continual Medical Image Segmentation

TL;DR

Abstract

Paper Structure (17 sections, 6 equations, 3 figures, 2 tables)

This paper contains 17 sections, 6 equations, 3 figures, 2 tables.

Introduction
Methodology
Low-Rank Mixture of Experts
Low-Rank Mixture of Experts Layers
Low-Rank MoE Attention
Continual Learning Gating Strategy
Task-level Gating:
Class-level Gating:
Experimental Setup & Result
Dataset
Implementation Details
Task-level Continual Learning
Class-level Continual Learning
Results
Task-level Continual Learning Results.
...and 2 more sections

Figures (3)

Figure 1: An overview of Low-Rank MoE architecture. MSA means multi-head self-attention module.
Figure 2: Illustration of the proposed task-level gating pipeline.
Figure 3: Illustration of the proposed class-level gating pipeline. Emb means Embeddings and GW means Gating Weights. $E_1$ and $E_2$ indicate expert 1 and expert 2.

Low-Rank Mixture-of-Experts for Continual Medical Image Segmentation

TL;DR

Abstract

Low-Rank Mixture-of-Experts for Continual Medical Image Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (3)