Event Camera Demosaicing via Swin Transformer and Pixel-focus Loss
Yunfan Lu, Yijie Xu, Wenzong Ma, Weiyu Guo, Hui Xiong
TL;DR
This paper tackles RAW-domain demosaicing for event cameras, where sensor design causes missing pixel values that hinder conventional RAW processing. It introduces a Swin-Transformer backbone with space-to-depth preprocessing and a U-Net–style encoder–decoder, plus a two-stage training strategy that first uses Charbonnier loss and then fine-tunes with Pixel Focus Loss to emphasize edge regions. The Pixel Focus Loss includes two forms, $\mathcal{L}_{pf}^p$ and $\mathcal{L}_{pf}^e$, addressing long-tail error distributions and improving convergence. Evaluations on the MIPI Demosaic Challenge dataset show improved reconstruction quality over baselines like RSTCANet, and the authors provide code and trained models to facilitate adoption and further RAW-domain research.
Abstract
Recent research has highlighted improvements in high-quality imaging guided by event cameras, with most of these efforts concentrating on the RGB domain. However, these advancements frequently neglect the unique challenges introduced by the inherent flaws in the sensor design of event cameras in the RAW domain. Specifically, this sensor design results in the partial loss of pixel values, posing new challenges for RAW domain processes like demosaicing. The challenge intensifies as most research in the RAW domain is based on the premise that each pixel contains a value, making the straightforward adaptation of these methods to event camera demosaicing problematic. To end this, we present a Swin-Transformer-based backbone and a pixel-focus loss function for demosaicing with missing pixel values in RAW domain processing. Our core motivation is to refine a general and widely applicable foundational model from the RGB domain for RAW domain processing, thereby broadening the model's applicability within the entire imaging process. Our method harnesses multi-scale processing and space-to-depth techniques to ensure efficiency and reduce computing complexity. We also proposed the Pixel-focus Loss function for network fine-tuning to improve network convergence based on our discovery of a long-tailed distribution in training loss. Our method has undergone validation on the MIPI Demosaic Challenge dataset, with subsequent analytical experimentation confirming its efficacy. All code and trained models are released here: https://github.com/yunfanLu/ev-demosaic
