Table of Contents
Fetching ...

Efficient Diffusion-Driven Corruption Editor for Test-Time Adaptation

Yeongtak Oh, Jonghyun Lee, Jooyoung Choi, Dahuin Jung, Uiwon Hwang, Sungroh Yoon

TL;DR

This work addresses test-time adaptation under unforeseen distribution shifts by introducing Decorruptor, a corruption-editing framework built on latent diffusion models. It leverages a novel corruption modeling scheme to fine-tune a diffusion model for editing corrupted inputs back to clean, using instruction-based conditioning, and further speeds this process with Decorruptor-CM, a distillation-based consistency model. The proposed approach delivers substantial speedups (roughly 100×) over diffusion-based baselines while achieving state-of-the-art or near-state-of-the-art accuracy on ImageNet-C and ImageNet-$\bar{\mathrm{C}}$ across multiple architectures, and extends to video with strong runtime efficiency. The methods show robustness to unseen corruptions, strong out-of-distribution generalization, and the ability to ensemble multiple decorrupted views for improved predictions, making test-time corruption editing practically viable for real-world image and video applications.

Abstract

Test-time adaptation (TTA) addresses the unforeseen distribution shifts occurring during test time. In TTA, performance, memory consumption, and time consumption are crucial considerations. A recent diffusion-based TTA approach for restoring corrupted images involves image-level updates. However, using pixel space diffusion significantly increases resource requirements compared to conventional model updating TTA approaches, revealing limitations as a TTA method. To address this, we propose a novel TTA method that leverages an image editing model based on a latent diffusion model (LDM) and fine-tunes it using our newly introduced corruption modeling scheme. This scheme enhances the robustness of the diffusion model against distribution shifts by creating (clean, corrupted) image pairs and fine-tuning the model to edit corrupted images into clean ones. Moreover, we introduce a distilled variant to accelerate the model for corruption editing using only 4 network function evaluations (NFEs). We extensively validated our method across various architectures and datasets including image and video domains. Our model achieves the best performance with a 100 times faster runtime than that of a diffusion-based baseline. Furthermore, it is three times faster than the previous model updating TTA method that utilizes data augmentation, making an image-level updating approach more feasible.

Efficient Diffusion-Driven Corruption Editor for Test-Time Adaptation

TL;DR

This work addresses test-time adaptation under unforeseen distribution shifts by introducing Decorruptor, a corruption-editing framework built on latent diffusion models. It leverages a novel corruption modeling scheme to fine-tune a diffusion model for editing corrupted inputs back to clean, using instruction-based conditioning, and further speeds this process with Decorruptor-CM, a distillation-based consistency model. The proposed approach delivers substantial speedups (roughly 100×) over diffusion-based baselines while achieving state-of-the-art or near-state-of-the-art accuracy on ImageNet-C and ImageNet- across multiple architectures, and extends to video with strong runtime efficiency. The methods show robustness to unseen corruptions, strong out-of-distribution generalization, and the ability to ensemble multiple decorrupted views for improved predictions, making test-time corruption editing practically viable for real-world image and video applications.

Abstract

Test-time adaptation (TTA) addresses the unforeseen distribution shifts occurring during test time. In TTA, performance, memory consumption, and time consumption are crucial considerations. A recent diffusion-based TTA approach for restoring corrupted images involves image-level updates. However, using pixel space diffusion significantly increases resource requirements compared to conventional model updating TTA approaches, revealing limitations as a TTA method. To address this, we propose a novel TTA method that leverages an image editing model based on a latent diffusion model (LDM) and fine-tunes it using our newly introduced corruption modeling scheme. This scheme enhances the robustness of the diffusion model against distribution shifts by creating (clean, corrupted) image pairs and fine-tuning the model to edit corrupted images into clean ones. Moreover, we introduce a distilled variant to accelerate the model for corruption editing using only 4 network function evaluations (NFEs). We extensively validated our method across various architectures and datasets including image and video domains. Our model achieves the best performance with a 100 times faster runtime than that of a diffusion-based baseline. Furthermore, it is three times faster than the previous model updating TTA method that utilizes data augmentation, making an image-level updating approach more feasible.
Paper Structure (49 sections, 11 equations, 15 figures, 13 tables, 2 algorithms)

This paper contains 49 sections, 11 equations, 15 figures, 13 tables, 2 algorithms.

Figures (15)

  • Figure 1: Visualization of instruction-guided image editing for the unseen corrupted image at the test-time. Compared to the baseline IP2P method, our proposed Decorruptor-DPM ($20$-step) and Decorruptor-CM ($4$-step) models show effective editing results without hurting the original semantics of the input corrupted image.
  • Figure 2: Representations of (a) Instance-wise connection map for corruption-like augmentations, corruption crafting results of (b) clean images to (c) corrupted images. In (a), we showcase how we constitute various corruption-like augmentations. Here, the sensitivity means the granularity of the corruption, the crafting phase means how to create the corrupted images, and the learning phase means how to learn editing.
  • Figure 3: Schematic of the overall training pipeline for the two proposed model variants: (a) Decorruptor-DPM, (b) Decorruptor-CM.
  • Figure 4: Illustration of the results of corruption editing for various corruptions at severity $5$. Consequently, we have verified that our Decorruptor-DPM and CM generally enable effective editing for test-time corruptions.
  • Figure 5: LPIPS scores with clean and corrupted images.
  • ...and 10 more figures