Table of Contents
Fetching ...

Revisiting the Generalization Problem of Low-level Vision Models Through the Lens of Image Deraining

Jinfan Hu, Zhiyuan You, Jinjin Gu, Kaiwen Zhu, Tianfan Xue, Chao Dong

TL;DR

The paper tackles generalization failures in low-level vision, arguing that unseen degradations, not limited model capacity, drive generalization gaps. Using image deraining as a clarifying testbed, it introduces a decoupled analysis of rain removal and background reconstruction and shows that training objectives and data complexity induce networks to shortcut toward learning degradation patterns. Key strategies to improve generalization include constraining background complexity, adjusting rain-range diversity, and leveraging content priors from pre-trained generative models (e.g., a fine-tuned VQGAN) to bias models toward image content. The authors validate these insights on deraining and, in an analogous toy task, on function denoising, demonstrating improved generalization and offering practical guidance for training LV models in real-world settings. Overall, the work provides a data-centric framework and concrete recommendations to enhance the robustness of low-level vision systems beyond synthetic benchmarks.

Abstract

Generalization remains a significant challenge for low-level vision models, which often struggle with unseen degradations in real-world scenarios despite their success in controlled benchmarks. In this paper, we revisit the generalization problem in low-level vision models. Image deraining is selected as a case study due to its well-defined and easily decoupled structure, allowing for more effective observation and analysis. Through comprehensive experiments, we reveal that the generalization issue is not primarily due to limited network capacity but rather the failure of existing training strategies, which leads networks to overfit specific degradation patterns. Our findings show that guiding networks to focus on learning the underlying image content, rather than the degradation patterns, is key to improving generalization. We demonstrate that balancing the complexity of background images and degradations in the training data helps networks better fit the image distribution. Furthermore, incorporating content priors from pre-trained generative models significantly enhances generalization. Experiments on both image deraining and image denoising validate the proposed strategies. We believe the insights and solutions will inspire further research and improve the generalization of low-level vision models.

Revisiting the Generalization Problem of Low-level Vision Models Through the Lens of Image Deraining

TL;DR

The paper tackles generalization failures in low-level vision, arguing that unseen degradations, not limited model capacity, drive generalization gaps. Using image deraining as a clarifying testbed, it introduces a decoupled analysis of rain removal and background reconstruction and shows that training objectives and data complexity induce networks to shortcut toward learning degradation patterns. Key strategies to improve generalization include constraining background complexity, adjusting rain-range diversity, and leveraging content priors from pre-trained generative models (e.g., a fine-tuned VQGAN) to bias models toward image content. The authors validate these insights on deraining and, in an analogous toy task, on function denoising, demonstrating improved generalization and offering practical guidance for training LV models in real-world settings. Overall, the work provides a data-centric framework and concrete recommendations to enhance the robustness of low-level vision systems beyond synthetic benchmarks.

Abstract

Generalization remains a significant challenge for low-level vision models, which often struggle with unseen degradations in real-world scenarios despite their success in controlled benchmarks. In this paper, we revisit the generalization problem in low-level vision models. Image deraining is selected as a case study due to its well-defined and easily decoupled structure, allowing for more effective observation and analysis. Through comprehensive experiments, we reveal that the generalization issue is not primarily due to limited network capacity but rather the failure of existing training strategies, which leads networks to overfit specific degradation patterns. Our findings show that guiding networks to focus on learning the underlying image content, rather than the degradation patterns, is key to improving generalization. We demonstrate that balancing the complexity of background images and degradations in the training data helps networks better fit the image distribution. Furthermore, incorporating content priors from pre-trained generative models significantly enhances generalization. Experiments on both image deraining and image denoising validate the proposed strategies. We believe the insights and solutions will inspire further research and improve the generalization of low-level vision models.

Paper Structure

This paper contains 25 sections, 1 equation, 21 figures, 3 tables.

Figures (21)

  • Figure 1: The existing deraining models suffer from severe generalization problems. After training with synthetic rainy images, when feeding (a) an image with different rain streaks, its output (b) shows a limited deraining effect. Two intuitive ways to improve generalization performance -- (c) adding background images, and (d) adding rain patterns, cannot effectively relieve the generalization issue. In this paper, we provide a new counter-intuitive insight -- (e) we improve the generalization ability of the deraining networks by selecting much less training background images for training.
  • Figure 2: (Left) The illustration of the rainy image synthesis. (Right) Our fine-grained analysis of the deraining results.
  • Figure 3: (a) Background images from different image datasets. It can be seen that the structure of the face image (CelebA) is relatively complex. Natural image patches (DIV2K) contain natural textures and patterns. The patterns in Manga109 and Urban100 are artificially created -- Manga images have sharp edges, while Urban images contain a lot of repeating patterns and self-similarities. (b) Rain streaks used in our experiments.
  • Figure 4: The relationship between the number of training patches and their rain removal performance. The $x$-axis represents the patch number, and the $y$-axis represents the rain removal effect $E_R$. Higher values on the $y$-axis mean better rain removal. The test rain patterns are not in the training set. The effect of rain removal at this time reflects the generalization performance. The qualitative results are obtained using ResNet.
  • Figure 5: Examples from DIV2K classified as low, medium, and high sharpness.
  • ...and 16 more figures