Table of Contents
Fetching ...

Continual Test-Time Adaptation for Single Image Defocus Deblurring via Causal Siamese Networks

Shuang Cui, Yi Li, Jiangmeng Li, Xiongxin Tang, Bing Su, Fanjiang Xu, Hui Xiong

TL;DR

This work tackles the degradation of SIDD models under out-of-distribution, lens-specific PSF heterogeneity by introducing CauSiam, a continual test-time adaptation framework built on Siamese networks. It couples SiamCTTA with a causality-informed semantic priors integration (VSPI) that leverages large-scale vision-language models (e.g., CLIP) to achieve causal identifiability between blurry inputs and restored images. The approach demonstrates substantial generalization gains across five SIDD datasets, improves robustness to long-term domain shifts, and maintains efficiency by updating only a cross-attention semantic module and employing EMA distillation. The integration of SCM-based analysis and VLM-derived priors provides a principled path to reducing semantic artifacts while preserving fine-grained details in restoration, enabling practical deployment across diverse imaging devices.

Abstract

Single image defocus deblurring (SIDD) aims to restore an all-in-focus image from a defocused one. Distribution shifts in defocused images generally lead to performance degradation of existing methods during out-of-distribution inferences. In this work, we gauge the intrinsic reason behind the performance degradation, which is identified as the heterogeneity of lens-specific point spread functions. Empirical evidence supports this finding, motivating us to employ a continual test-time adaptation (CTTA) paradigm for SIDD. However, traditional CTTA methods, which primarily rely on entropy minimization, cannot sufficiently explore task-dependent information for pixel-level regression tasks like SIDD. To address this issue, we propose a novel Siamese networks-based continual test-time adaptation framework, which adapts source models to continuously changing target domains only requiring unlabeled target data in an online manner. To further mitigate semantically erroneous textures introduced by source SIDD models under severe degradation, we revisit the learning paradigm through a structural causal model and propose Causal Siamese networks (CauSiam). Our method leverages large-scale pre-trained vision-language models to derive discriminative universal semantic priors and integrates these priors into Siamese networks, ensuring causal identifiability between blurry inputs and restored images. Extensive experiments demonstrate that CauSiam effectively improves the generalization performance of existing SIDD methods in continuously changing domains.

Continual Test-Time Adaptation for Single Image Defocus Deblurring via Causal Siamese Networks

TL;DR

This work tackles the degradation of SIDD models under out-of-distribution, lens-specific PSF heterogeneity by introducing CauSiam, a continual test-time adaptation framework built on Siamese networks. It couples SiamCTTA with a causality-informed semantic priors integration (VSPI) that leverages large-scale vision-language models (e.g., CLIP) to achieve causal identifiability between blurry inputs and restored images. The approach demonstrates substantial generalization gains across five SIDD datasets, improves robustness to long-term domain shifts, and maintains efficiency by updating only a cross-attention semantic module and employing EMA distillation. The integration of SCM-based analysis and VLM-derived priors provides a principled path to reducing semantic artifacts while preserving fine-grained details in restoration, enabling practical deployment across diverse imaging devices.

Abstract

Single image defocus deblurring (SIDD) aims to restore an all-in-focus image from a defocused one. Distribution shifts in defocused images generally lead to performance degradation of existing methods during out-of-distribution inferences. In this work, we gauge the intrinsic reason behind the performance degradation, which is identified as the heterogeneity of lens-specific point spread functions. Empirical evidence supports this finding, motivating us to employ a continual test-time adaptation (CTTA) paradigm for SIDD. However, traditional CTTA methods, which primarily rely on entropy minimization, cannot sufficiently explore task-dependent information for pixel-level regression tasks like SIDD. To address this issue, we propose a novel Siamese networks-based continual test-time adaptation framework, which adapts source models to continuously changing target domains only requiring unlabeled target data in an online manner. To further mitigate semantically erroneous textures introduced by source SIDD models under severe degradation, we revisit the learning paradigm through a structural causal model and propose Causal Siamese networks (CauSiam). Our method leverages large-scale pre-trained vision-language models to derive discriminative universal semantic priors and integrates these priors into Siamese networks, ensuring causal identifiability between blurry inputs and restored images. Extensive experiments demonstrate that CauSiam effectively improves the generalization performance of existing SIDD methods in continuously changing domains.
Paper Structure (26 sections, 1 theorem, 19 equations, 9 figures, 10 tables, 2 algorithms)

This paper contains 26 sections, 1 theorem, 19 equations, 9 figures, 10 tables, 2 algorithms.

Key Result

theorem 1

Let $G$ be a directed acyclic graph (DAG) associated with a causal model, and let $P(\cdot)$ stand for the probability distribution induced by that model. For any disjoint subsets of variables $X, Y, Z$, and $W$, we have the following rules.

Figures (9)

  • Figure 1: Motivation experiments. (a) The visualization illustrates lens-specific and lens-agnostic PSF heterogeneity for two devices: Canon EOS 5D and Lytro Illum ruan2022learning. (b) The NRKNet model, trained on the DPDD training set, successfully restores DPDD test images from the same device with the training set (lens-agnostic), but fails on LFDOF images from different devices (lens-specific), introducing false white artifacts. (c) Limited performance of existing CTTA algorithms (e.g., "TENT", "CoTTA", and "SAR") during online continual adaptation over time. "Source" represents the DPDNet-S abuolaim2020defocus model trained on the DPDD dataset without adaptation. PSNR(dB) is used as the evaluation metric. (d) In cases of severe degradation, deblurring results without semantic priors exhibit semantically erroneous textures.
  • Figure 2: Framework of the proposed CauSiam. (a) The online model processes the original blurry image, while the offline model handles geometric augmentations (i.e., rotate and flip) to generate pseudo labels. We use a consistency loss (Equation \ref{['equ:loss_total']}) as the optimization objective to update CauSiam. (b) VLMs-guided semantic encoder (VGSE) module extracts universal semantic priors embeddings for each test blurry image. (c) Cross attention (CA) module integrates these embeddings into the source SIDD model.
  • Figure 3: The proposed SCMs for SIDD. (a) The SCM of SIDD involves only SiamCTTA. (b) The SCM of SIDD regards the proposed CauSiam.
  • Figure 4: Qualitative comparison of source SIDD models with and without our CauSiam on DPDD, RealDoF, LFDOF, RTF, and CHUK test sets during continual test-time adaptation. The odd-numbered rows (1st, 3rd, 5th, 7th, and 9th) show visualizations of different source SIDD models trained on the DPDD training set without adaptation. The even-numbered rows (2nd, 4th, 6th, 8th, and 10th) display the visualizations after integrating CauSiam into source models.
  • Figure 5: Qualitative comparison of different CTTA methods on DPDD, RealDoF, LFDOF, RTF, and CHUK test sets during continual test-time adaptation. "Source Only" denotes the DPDNet-S model trained on the DPDD training set without adaptation.
  • ...and 4 more figures

Theorems & Definitions (4)

  • definition thmcounterdefinition
  • definition thmcounterdefinition
  • definition thmcounterdefinition
  • theorem 1: Rules of $do$ Calculus pearl1995causal