Revisiting the Generalization Problem of Low-level Vision Models Through the Lens of Image Deraining
Jinfan Hu, Zhiyuan You, Jinjin Gu, Kaiwen Zhu, Tianfan Xue, Chao Dong
TL;DR
The paper tackles generalization failures in low-level vision, arguing that unseen degradations, not limited model capacity, drive generalization gaps. Using image deraining as a clarifying testbed, it introduces a decoupled analysis of rain removal and background reconstruction and shows that training objectives and data complexity induce networks to shortcut toward learning degradation patterns. Key strategies to improve generalization include constraining background complexity, adjusting rain-range diversity, and leveraging content priors from pre-trained generative models (e.g., a fine-tuned VQGAN) to bias models toward image content. The authors validate these insights on deraining and, in an analogous toy task, on function denoising, demonstrating improved generalization and offering practical guidance for training LV models in real-world settings. Overall, the work provides a data-centric framework and concrete recommendations to enhance the robustness of low-level vision systems beyond synthetic benchmarks.
Abstract
Generalization remains a significant challenge for low-level vision models, which often struggle with unseen degradations in real-world scenarios despite their success in controlled benchmarks. In this paper, we revisit the generalization problem in low-level vision models. Image deraining is selected as a case study due to its well-defined and easily decoupled structure, allowing for more effective observation and analysis. Through comprehensive experiments, we reveal that the generalization issue is not primarily due to limited network capacity but rather the failure of existing training strategies, which leads networks to overfit specific degradation patterns. Our findings show that guiding networks to focus on learning the underlying image content, rather than the degradation patterns, is key to improving generalization. We demonstrate that balancing the complexity of background images and degradations in the training data helps networks better fit the image distribution. Furthermore, incorporating content priors from pre-trained generative models significantly enhances generalization. Experiments on both image deraining and image denoising validate the proposed strategies. We believe the insights and solutions will inspire further research and improve the generalization of low-level vision models.
