Domain-Conditioned Transformer for Fully Test-time Adaptation

Yushun Tang; Shuoshuo Chen; Jiyuan Jia; Yi Zhang; Zhihai He

Domain-Conditioned Transformer for Fully Test-time Adaptation

Yushun Tang, Shuoshuo Chen, Jiyuan Jia, Yi Zhang, Zhihai He

TL;DR

This work proposes a new structure for the self-attention modules in the transformer that incorporates three domain-conditioning vectors, called domain conditioners, into the query, key, and value components of the self-attention module and finds that these domain conditioners are able to gradually remove the impact of domain shift and largely recover the original self-attention profile.

Abstract

Fully test-time adaptation aims to adapt a network model online based on sequential analysis of input samples during the inference stage. We observe that, when applying a transformer network model into a new domain, the self-attention profiles of image samples in the target domain deviate significantly from those in the source domain, which results in large performance degradation during domain changes. To address this important issue, we propose a new structure for the self-attention modules in the transformer. Specifically, we incorporate three domain-conditioning vectors, called domain conditioners, into the query, key, and value components of the self-attention module. We learn a network to generate these three domain conditioners from the class token at each transformer network layer. We find that, during fully online test-time adaptation, these domain conditioners at each transform network layer are able to gradually remove the impact of domain shift and largely recover the original self-attention profile. Our extensive experimental results demonstrate that the proposed domain-conditioned transformer significantly improves the online fully test-time domain adaptation performance and outperforms existing state-of-the-art methods by large margins.

Domain-Conditioned Transformer for Fully Test-time Adaptation

TL;DR

Abstract

Domain-Conditioned Transformer for Fully Test-time Adaptation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)