BECoTTA: Input-dependent Online Blending of Experts for Continual Test-time Adaptation

Daeun Lee; Jaehong Yoon; Sung Ju Hwang

BECoTTA: Input-dependent Online Blending of Experts for Continual Test-time Adaptation

Daeun Lee, Jaehong Yoon, Sung Ju Hwang

TL;DR

BECoTTA addresses continual test-time adaptation by introducing Mixture-of-Domain Low-rank Experts (MoDE) with domain-adaptive routing and a Domain-Expert Synergy Loss to enable input-dependent, sparse updates that preserve past knowledge. It also introduces the Continual Gradual Shifts (CGS) benchmark to evaluate adaptation under gradual domain changes. Empirically, BECoTTA and its SDA-enhanced BECoTTA+ outperform strong CTTA baselines across disjoint and gradual shifts while dramatically reducing trainable parameters, with strong performance on segmentation, classification, and zero-shot domain generalization. The approach is well-suited for edge devices and real-world deployment due to its modularity, efficiency, and domain-aware specialization.

Abstract

Continual Test Time Adaptation (CTTA) is required to adapt efficiently to continuous unseen domains while retaining previously learned knowledge. However, despite the progress of CTTA, it is still challenging to deploy the model with improved forgetting-adaptation trade-offs and efficiency. In addition, current CTTA scenarios assume only the disjoint situation, even though real-world domains are seamlessly changed. To address these challenges, this paper proposes BECoTTA, an input-dependent and efficient modular framework for CTTA. We propose Mixture-of Domain Low-rank Experts (MoDE) that contains two core components: (i) Domain-Adaptive Routing, which helps to selectively capture the domain adaptive knowledge with multiple domain routers, and (ii) Domain-Expert Synergy Loss to maximize the dependency between each domain and expert. We validate that our method outperforms multiple CTTA scenarios, including disjoint and gradual domain shits, while only requiring ~98% fewer trainable parameters. We also provide analyses of our method, including the construction of experts, the effect of domain-adaptive experts, and visualizations.

BECoTTA: Input-dependent Online Blending of Experts for Continual Test-time Adaptation

TL;DR

Abstract

Paper Structure (34 sections, 10 equations, 11 figures, 21 tables, 1 algorithm)

This paper contains 34 sections, 10 equations, 11 figures, 21 tables, 1 algorithm.

Introduction
Related Works
Continual Test-Time Adaptation.
Mixture-of-Experts.
Blurry Scenario in Continual Learning.
Input-dependent Online Blending of Experts for Continual Test-time Adaptation
Problem Statement
Domain-Augmented Initialization.
Mixture-of-Domain Low-rank Experts (MoDE)
Continual Test-time Adaptation Process
Experiments
Datasets
Experimental Setting
Main Results
Analyses and Ablations
...and 19 more sections

Figures (11)

Figure 1: BECoTTA and BECoTTA+ achieve superior 10-round average IoU and parameter/memory efficiency against strong CTTA baselines on the CDS-hard scenario.
Figure 2: Comparison of TTA process with other SoTA models. We compare the existing models tentcottaecotta and denote activated modules as yellow during CTTA process. In particular, CoTTA adopts the mean-teacher architecture and updates the entire model. TENT tent and EcoTTA ecotta update only a few parameter-efficient modules in the model. However, they achieve suboptimal performance with forgetting. Meanwhile, our BECoTTA updates only MoDE layers for efficient and rapid adaptation while preserving previous knowledge.
Figure 3: The overview of BECoTTA. We propose a novel CTTA framework for dynamic real-world scenarios, including disjoint and gradual shifts of domains. When the model receives a target domain input $\bm{x_t}$ at timestep $t$, the Domain Discriminator (DD) first estimates a pseudo-domain label $d$. Based on estimated pseudo-labels, the domain router $G_d$ processes the input to specific experts containing domain-specific information by minimizing Domain-Expert Synergy Loss$\Theta(D;A)$. Finally, we obtain a domain-adaptive representation $h_d(\bm{x})$, addressing downstream tasks in test-time.
Figure 4: Pseudo label Visualization. Our BECoTTA generates more fine-grained and accurate labels than baselines.
Figure 6: Expert Analysis.Left: We visualize the frequency of ten expert selections for each domain during CTTA. Our frequency map shows co-selected and isolated experts in different domains. Right: We interpret the similarity between target domains by visualizing the assignment weights from each domain-adaptive router.
...and 6 more figures

BECoTTA: Input-dependent Online Blending of Experts for Continual Test-time Adaptation

TL;DR

Abstract

BECoTTA: Input-dependent Online Blending of Experts for Continual Test-time Adaptation

Authors

TL;DR

Abstract

Table of Contents

Figures (11)