Table of Contents
Fetching ...

Frequency Composition for Compressed and Domain-Adaptive Neural Networks

Yoojin Kwon, Hongjun Suh, Wooseok Lee, Taesik Gong, Songyi Han, Hyung-Sin Kim

TL;DR

The paper tackles the dual challenge of resource-constrained on-device inference and dynamic domain shifts by introducing CoDA, a frequency-aware framework that unifies compression and domain adaptation. It trains quantized models on low-frequency content using LFC QAT and refines them at test time with Frequency-Aware BN (FABN) that leverages full-frequency features to adapt to target domains. The approach yields substantial gains on CIFAR10-C and ImageNet-C across architectures and bitwidths, while achieving 4–16x model compression, and remains compatible with standard QAT and TTA methods. The results establish that separating learning/normalization by frequency components improves robustness to domain shifts and enables effective continual adaptation for on-device systems. Overall, CoDA demonstrates a practical pathway to deploy highly compressed, domain-resilient neural networks in real-world, dynamic environments.

Abstract

Modern on-device neural network applications must operate under resource constraints while adapting to unpredictable domain shifts. However, this combined challenge-model compression and domain adaptation-remains largely unaddressed, as prior work has tackled each issue in isolation: compressed networks prioritize efficiency within a fixed domain, whereas large, capable models focus on handling domain shifts. In this work, we propose CoDA, a frequency composition-based framework that unifies compression and domain adaptation. During training, CoDA employs quantization-aware training (QAT) with low-frequency components, enabling a compressed model to selectively learn robust, generalizable features. At test time, it refines the compact model in a source-free manner (i.e., test-time adaptation, TTA), leveraging the full-frequency information from incoming data to adapt to target domains while treating high-frequency components as domain-specific cues. LFC are aligned with the trained distribution, while HFC unique to the target distribution are solely utilized for batch normalization. CoDA can be integrated synergistically into existing QAT and TTA methods. CoDA is evaluated on widely used domain-shift benchmarks, including CIFAR10-C and ImageNet-C, across various model architectures. With significant compression, it achieves accuracy improvements of 7.96%p on CIFAR10-C and 5.37%p on ImageNet-C over the full-precision TTA baseline.

Frequency Composition for Compressed and Domain-Adaptive Neural Networks

TL;DR

The paper tackles the dual challenge of resource-constrained on-device inference and dynamic domain shifts by introducing CoDA, a frequency-aware framework that unifies compression and domain adaptation. It trains quantized models on low-frequency content using LFC QAT and refines them at test time with Frequency-Aware BN (FABN) that leverages full-frequency features to adapt to target domains. The approach yields substantial gains on CIFAR10-C and ImageNet-C across architectures and bitwidths, while achieving 4–16x model compression, and remains compatible with standard QAT and TTA methods. The results establish that separating learning/normalization by frequency components improves robustness to domain shifts and enables effective continual adaptation for on-device systems. Overall, CoDA demonstrates a practical pathway to deploy highly compressed, domain-resilient neural networks in real-world, dynamic environments.

Abstract

Modern on-device neural network applications must operate under resource constraints while adapting to unpredictable domain shifts. However, this combined challenge-model compression and domain adaptation-remains largely unaddressed, as prior work has tackled each issue in isolation: compressed networks prioritize efficiency within a fixed domain, whereas large, capable models focus on handling domain shifts. In this work, we propose CoDA, a frequency composition-based framework that unifies compression and domain adaptation. During training, CoDA employs quantization-aware training (QAT) with low-frequency components, enabling a compressed model to selectively learn robust, generalizable features. At test time, it refines the compact model in a source-free manner (i.e., test-time adaptation, TTA), leveraging the full-frequency information from incoming data to adapt to target domains while treating high-frequency components as domain-specific cues. LFC are aligned with the trained distribution, while HFC unique to the target distribution are solely utilized for batch normalization. CoDA can be integrated synergistically into existing QAT and TTA methods. CoDA is evaluated on widely used domain-shift benchmarks, including CIFAR10-C and ImageNet-C, across various model architectures. With significant compression, it achieves accuracy improvements of 7.96%p on CIFAR10-C and 5.37%p on ImageNet-C over the full-precision TTA baseline.

Paper Structure

This paper contains 45 sections, 5 equations, 8 figures, 16 tables.

Figures (8)

  • Figure 1: Effectiveness of CoDA when applied to various models (ResNet18 and ResNet50), TTA methods (NORM nado2020evaluatingschneider2020improving and TENT wang2020tent) and QAT method (LSQ esser2019learned) using three bitwidths (2, 4, and 8 bits). We train on ImageNet and evaluate on ImageNet-C.
  • Figure 2: An illustration of the proposed CoDA. Left: Using Fast Fourier Transformation, an image can be decomposed into HFC consisting of fast-changing patterns (i.e. edges or stripes) and LFC consisting of slow-changing patterns (i.e. smooth shape). During training, CoDA focus on learning generalizable features from LFC rather than irregular patterns in HFC (LFC QAT). Right: At test time, under domain shift, CoDA utilizes full-frequency of target data and adapts BN layers with our frequency-aware BN (FABN); In the lower frequency of intermediate activations, we utilize running statistics of them to initialize and gradually update the BN statistics. Meanwhile, in the higher frequency, we maintain the original distribution from the activation without composition into source distribution. Finally, both statistics from low-/high-frequency are adequately combined and used to normalize the test batch.
  • Figure 3: Inter-domain distance matrices of LFC and HFC frequency domain image. The values in Figure \ref{['fig:low_pass_distance']} appear significantly smaller than those in Figure \ref{['fig:high_pass_distance']}, indicating that LFC is more domain-invariant compared to HFC. Details of distance matrix calculation are provided in the supplementary materials.
  • Figure 4: The loss landscapes of quantized ResNet26 on CIFAR10 (Clean) and CIFAR10-C (Corrupt.) trained on different frequency ranges. The quantization method is LSQ esser2019learned and the quantization level is 2-bit. FFC refers to the full frequency of test data; LFC refers to the lower frequency range in a low pass filter of radius 8. Sharpness and concave regions generally indicate less robustness. We use visualization methods following previous works li2018visualizingforet2020sharpness.
  • Figure 5: SSE comparison between LSQ esser2019learned with and without CoDA. SSE is measured between $\hat{\mu}_{s}$ and $\hat{\mu}_{\text{lfc}, t}$ for the LFC-trained model (CoDA), and between $\hat{\mu}_{s}$ and $\hat{\mu}_{\text{t}}$ for the FFC-trained model. SSE is gathered over all layers. Quantization level is 2-bit.
  • ...and 3 more figures