Efficient Diffusion-Driven Corruption Editor for Test-Time Adaptation

Yeongtak Oh; Jonghyun Lee; Jooyoung Choi; Dahuin Jung; Uiwon Hwang; Sungroh Yoon

Efficient Diffusion-Driven Corruption Editor for Test-Time Adaptation

Yeongtak Oh, Jonghyun Lee, Jooyoung Choi, Dahuin Jung, Uiwon Hwang, Sungroh Yoon

TL;DR

This work addresses test-time adaptation under unforeseen distribution shifts by introducing Decorruptor, a corruption-editing framework built on latent diffusion models. It leverages a novel corruption modeling scheme to fine-tune a diffusion model for editing corrupted inputs back to clean, using instruction-based conditioning, and further speeds this process with Decorruptor-CM, a distillation-based consistency model. The proposed approach delivers substantial speedups (roughly 100×) over diffusion-based baselines while achieving state-of-the-art or near-state-of-the-art accuracy on ImageNet-C and ImageNet-$\bar{\mathrm{C}}$ across multiple architectures, and extends to video with strong runtime efficiency. The methods show robustness to unseen corruptions, strong out-of-distribution generalization, and the ability to ensemble multiple decorrupted views for improved predictions, making test-time corruption editing practically viable for real-world image and video applications.

Abstract

Test-time adaptation (TTA) addresses the unforeseen distribution shifts occurring during test time. In TTA, performance, memory consumption, and time consumption are crucial considerations. A recent diffusion-based TTA approach for restoring corrupted images involves image-level updates. However, using pixel space diffusion significantly increases resource requirements compared to conventional model updating TTA approaches, revealing limitations as a TTA method. To address this, we propose a novel TTA method that leverages an image editing model based on a latent diffusion model (LDM) and fine-tunes it using our newly introduced corruption modeling scheme. This scheme enhances the robustness of the diffusion model against distribution shifts by creating (clean, corrupted) image pairs and fine-tuning the model to edit corrupted images into clean ones. Moreover, we introduce a distilled variant to accelerate the model for corruption editing using only 4 network function evaluations (NFEs). We extensively validated our method across various architectures and datasets including image and video domains. Our model achieves the best performance with a 100 times faster runtime than that of a diffusion-based baseline. Furthermore, it is three times faster than the previous model updating TTA method that utilizes data augmentation, making an image-level updating approach more feasible.

Efficient Diffusion-Driven Corruption Editor for Test-Time Adaptation

TL;DR

across multiple architectures, and extends to video with strong runtime efficiency. The methods show robustness to unseen corruptions, strong out-of-distribution generalization, and the ability to ensemble multiple decorrupted views for improved predictions, making test-time corruption editing practically viable for real-world image and video applications.

Abstract

Paper Structure (49 sections, 11 equations, 15 figures, 13 tables, 2 algorithms)

This paper contains 49 sections, 11 equations, 15 figures, 13 tables, 2 algorithms.

Introduction
Related Works
Latent Diffusion Models
Image Restoration
Test-Time Adaptation
Preliminaries
Diffusion Models
Classifier Free Guidance
Consistency Models
Proposed Method
Corruption Modeling Scheme
Decorruptor-DPM: Instruction-Based Corruption Editing
Fine-Tuning U-Net with Corruption-Like Augmentations
Scheduling Image Guidance Scale
Decorruptor-CM: Accelerate DPM to CM
...and 34 more sections

Figures (15)

Figure 1: Visualization of instruction-guided image editing for the unseen corrupted image at the test-time. Compared to the baseline IP2P method, our proposed Decorruptor-DPM ($20$-step) and Decorruptor-CM ($4$-step) models show effective editing results without hurting the original semantics of the input corrupted image.
Figure 2: Representations of (a) Instance-wise connection map for corruption-like augmentations, corruption crafting results of (b) clean images to (c) corrupted images. In (a), we showcase how we constitute various corruption-like augmentations. Here, the sensitivity means the granularity of the corruption, the crafting phase means how to create the corrupted images, and the learning phase means how to learn editing.
Figure 3: Schematic of the overall training pipeline for the two proposed model variants: (a) Decorruptor-DPM, (b) Decorruptor-CM.
Figure 4: Illustration of the results of corruption editing for various corruptions at severity $5$. Consequently, we have verified that our Decorruptor-DPM and CM generally enable effective editing for test-time corruptions.
Figure 5: LPIPS scores with clean and corrupted images.
...and 10 more figures

Efficient Diffusion-Driven Corruption Editor for Test-Time Adaptation

TL;DR

Abstract

Efficient Diffusion-Driven Corruption Editor for Test-Time Adaptation

Authors

TL;DR

Abstract

Table of Contents

Figures (15)