Continual Test-Time Adaptation for Single Image Defocus Deblurring via Causal Siamese Networks
Shuang Cui, Yi Li, Jiangmeng Li, Xiongxin Tang, Bing Su, Fanjiang Xu, Hui Xiong
TL;DR
This work tackles the degradation of SIDD models under out-of-distribution, lens-specific PSF heterogeneity by introducing CauSiam, a continual test-time adaptation framework built on Siamese networks. It couples SiamCTTA with a causality-informed semantic priors integration (VSPI) that leverages large-scale vision-language models (e.g., CLIP) to achieve causal identifiability between blurry inputs and restored images. The approach demonstrates substantial generalization gains across five SIDD datasets, improves robustness to long-term domain shifts, and maintains efficiency by updating only a cross-attention semantic module and employing EMA distillation. The integration of SCM-based analysis and VLM-derived priors provides a principled path to reducing semantic artifacts while preserving fine-grained details in restoration, enabling practical deployment across diverse imaging devices.
Abstract
Single image defocus deblurring (SIDD) aims to restore an all-in-focus image from a defocused one. Distribution shifts in defocused images generally lead to performance degradation of existing methods during out-of-distribution inferences. In this work, we gauge the intrinsic reason behind the performance degradation, which is identified as the heterogeneity of lens-specific point spread functions. Empirical evidence supports this finding, motivating us to employ a continual test-time adaptation (CTTA) paradigm for SIDD. However, traditional CTTA methods, which primarily rely on entropy minimization, cannot sufficiently explore task-dependent information for pixel-level regression tasks like SIDD. To address this issue, we propose a novel Siamese networks-based continual test-time adaptation framework, which adapts source models to continuously changing target domains only requiring unlabeled target data in an online manner. To further mitigate semantically erroneous textures introduced by source SIDD models under severe degradation, we revisit the learning paradigm through a structural causal model and propose Causal Siamese networks (CauSiam). Our method leverages large-scale pre-trained vision-language models to derive discriminative universal semantic priors and integrates these priors into Siamese networks, ensuring causal identifiability between blurry inputs and restored images. Extensive experiments demonstrate that CauSiam effectively improves the generalization performance of existing SIDD methods in continuously changing domains.
