SLCFormer: Spectral-Local Context Transformer with Physics-Grounded Flare Synthesis for Nighttime Flare Removal
Xiyu Zhu, Wei Wang, Xin Yuan, Xiao Wang
TL;DR
This work tackles nighttime lens flare by addressing nonuniform, spatially varying artifacts with a physics-grounded data synthesis approach and a novel transformer-based removal framework. It introduces SLCFormer, a spectral-local context transformer comprising FFEM for global frequency context and DESM for local directional features, enabling effective flare removal while preserving scene content. A ZernikeVAE-based scatter flare generation pipeline creates physically plausible, spatially varying PSFs to enrich training data and improve generalization to real scenes. Experiments on Flare7K++ show state-of-the-art performance in both quantitative metrics and perceptual quality, demonstrating robust applicability to real-world nighttime flare scenarios, albeit with added computational overhead.
Abstract
Lens flare is a common nighttime artifact caused by strong light sources scattering within camera lenses, leading to hazy streaks, halos, and glare that degrade visual quality. However, existing methods usually fail to effectively address nonuniform scattered flares, which severely reduces their applicability to complex real-world scenarios with diverse lighting conditions. To address this issue, we propose SLCFormer, a novel spectral-local context transformer framework for effective nighttime lens flare removal. SLCFormer integrates two key modules: the Frequency Fourier and Excitation Module (FFEM), which captures efficient global contextual representations in the frequency domain to model flare characteristics, and the Directionally-Enhanced Spatial Module (DESM) for local structural enhancement and directional features in the spatial domain for precise flare removal. Furthermore, we introduce a ZernikeVAE-based scatter flare generation pipeline to synthesize physically realistic scatter flares with spatially varying PSFs, bridging optical physics and data-driven training. Extensive experiments on the Flare7K++ dataset demonstrate that our method achieves state-of-the-art performance, outperforming existing approaches in both quantitative metrics and perceptual visual quality, and generalizing robustly to real nighttime scenes with complex flare artifacts.
