Table of Contents
Fetching ...

Hierarchical Adaptive networks with Task vectors for Test-Time Adaptation

Sameer Ambekar, Marta Hasny, Laura Daza, Daniel M. Lang, Julia A. Schnabel

Abstract

Test-time adaptation allows pretrained models to adjust to incoming data streams, addressing distribution shifts between source and target domains. However, standard methods rely on single-dimensional linear classification layers, which often fail to handle diverse and complex shifts. We propose Hierarchical Adaptive Networks with Task Vectors (Hi-Vec), which leverages multiple layers of increasing size for dynamic test-time adaptation. By decomposing the encoder's representation space into such hierarchically organized layers, Hi-Vec, in a plug-and-play manner, allows existing methods to adapt to shifts of varying complexity. Our contributions are threefold: First, we propose dynamic layer selection for automatic identification of the optimal layer for adaptation to each test batch. Second, we propose a mechanism that merges weights from the dynamic layer to other layers, ensuring all layers receive target information. Third, we propose linear layer agreement that acts as a gating function, preventing erroneous fine-tuning by adaptation on noisy batches. We rigorously evaluate the performance of Hi-Vec in challenging scenarios and on multiple target datasets, proving its strong capability to advance state-of-the-art methods. Our results show that Hi-Vec improves robustness, addresses uncertainty, and handles limited batch sizes and increased outlier rates.

Hierarchical Adaptive networks with Task vectors for Test-Time Adaptation

Abstract

Test-time adaptation allows pretrained models to adjust to incoming data streams, addressing distribution shifts between source and target domains. However, standard methods rely on single-dimensional linear classification layers, which often fail to handle diverse and complex shifts. We propose Hierarchical Adaptive Networks with Task Vectors (Hi-Vec), which leverages multiple layers of increasing size for dynamic test-time adaptation. By decomposing the encoder's representation space into such hierarchically organized layers, Hi-Vec, in a plug-and-play manner, allows existing methods to adapt to shifts of varying complexity. Our contributions are threefold: First, we propose dynamic layer selection for automatic identification of the optimal layer for adaptation to each test batch. Second, we propose a mechanism that merges weights from the dynamic layer to other layers, ensuring all layers receive target information. Third, we propose linear layer agreement that acts as a gating function, preventing erroneous fine-tuning by adaptation on noisy batches. We rigorously evaluate the performance of Hi-Vec in challenging scenarios and on multiple target datasets, proving its strong capability to advance state-of-the-art methods. Our results show that Hi-Vec improves robustness, addresses uncertainty, and handles limited batch sizes and increased outlier rates.

Paper Structure

This paper contains 39 sections, 11 equations, 7 figures, 10 tables, 2 algorithms.

Figures (7)

  • Figure 1: Standard methods and Hi-Vec alongside examples of shifts. (a) Standard test-time adaptation methods adjust the same set of parameters and rely on the representation of a single-dimensional linear layer, which is forced to handle all kinds of domain shifts at the same time. (b) Hi-Vec introduces hierarchical linear layers featuring coarse-to-fine representations, and uses dynamic selection, which allows for individual handling by identification of the layer most suitable to address a specific domain shift. (detailed architecture in Figure \ref{['fig2:arch']}).
  • Figure 2: Illustration of Hi-Vec. Our framework introduces (i) Dynamic selection of hierarchical linear layers to find the optimal linear layer for the specific type of distribution shift in each batch. Next, through (ii) Hierarchical layer agreement, it evaluates logit consistency across multiple representations to decide if adaptation is needed. Finally, if agreement, (iii) Target information sharing via task vectors and adaptation are performed; otherwise, it proceeds directly to inference, minimizing computational overhead and erroneous finetuning.
  • Figure 3: Hi-Vec offers additional benefits: (a) Handles small batch sizes, (b) Addresses uncertainty, (c) Robust to increased outliers. Hi-Vec performs well over all the common methods with small batch sizes, improving common adaptation methods to handle scenarios reflective of real-world conditions. Moreover, Hi-Vec also improves addressing uncertainty (lower is better) and a higher proportion of outliers in test batches, ensuring robust performance.
  • Figure 4: Mitigates Catastrophic Forgetting. Resluts reported for Cifar-10-c using ResNet-18. We evaluate the model on the source domain after adapting it to every target domain. Hi-Vec preserves the source domain knowledge and prevents forgetting on the dataset at test-time.
  • Figure 5: Grad-CAM Visualizations and Layer selection insights on Cifar-10-c with ResNet-18 by Stamp + Hi-Vec. We provide the histogram figures for the outputs of the hierarchical linear layers. Together with the dimension of the model that is being used for the prediction and histogram of dimensions (where layer 0 has 8 dimensions, layer 1 has 16, and layer n has $2^{n+1}$ dimensions) for a random batch of the Cifar-10-c dataset.
  • ...and 2 more figures