Unveiling Deep Shadows: A Survey and Benchmark on Image and Video Shadow Detection, Removal, and Generation in the Deep Learning Era
Xiaowei Hu, Zhenghao Xing, Tianyu Wang, Chi-Wing Fu, Pheng-Ann Heng
TL;DR
Shadows critically affect scene understanding and visual realism; this work surveys deep-learning approaches for shadow detection, removal, and generation in images and videos. It provides a standardized benchmark framework, cross-dataset generalization studies, and analyses of model size/speed versus performance, while highlighting the growing role of large vision models and AIGC implications. Key contributions include a taxonomic survey, fair experimental comparisons, and publicly available resources to accelerate future research. The findings underscore strong progress yet emphasize gaps in generalization, multi-object/video consistency, and dataset diversity, informing practical applications in image/video editing and synthesis.
Abstract
Shadows are created when light encounters obstacles, resulting in regions of reduced illumination. In computer vision, detecting, removing, and generating shadows are critical tasks for improving scene understanding, enhancing image quality, ensuring visual consistency in video editing, and optimizing virtual environments. This paper offers a comprehensive survey and evaluation benchmark on shadow detection, removal, and generation in both images and videos, focusing on the deep learning approaches of the past decade. It covers key aspects such as tasks, deep models, datasets, evaluation metrics, and comparative results under consistent experimental settings. Our main contributions include a thorough survey of shadow analysis, the standardization of experimental comparisons, an exploration of the relationships between model size, speed, and performance, a cross-dataset generalization study, the identification of open challenges and future research directions, and the provision of publicly available resources to support further research in this field.
