Pix2HDR -- A pixel-wise acquisition and deep learning-based synthesis approach for high-speed HDR videos
Caixin Wang, Jie Zhang, Matthew A. Wilson, Ralph Etienne-Cummings
TL;DR
Pix2HDR tackles the trade-off between dynamic range and temporal resolution in HDR video by coupling pixel-wise MPVE sampling on a PE-CMOS sensor with a two-stage deep learning synthesis (LDR-net for spatiotemporal upsampling and HDR-net for fusion). The MPVE pattern uses multi-phase exposures across 2×2 pixel patches to boost temporal bandwidth and dynamic range while mitigating aliasing. The LDR-HDR networks are trained end-to-end on camera-simulated measurements derived from public HDR videos, achieving real-time HDR video synthesis at up to 400 Hz and 2 ms temporal resolution for HDR frames, with 12–24 dB DR improvement depending on configuration. The results show substantial improvements over frame-based and interleaved exposures in PSNR/SSIM and HDR-VDP metrics, while maintaining high spatial resolution and robustness to low-light and bright-background conditions. This approach enables high-speed HDR video for dynamic scenes in robotics, autonomous systems, and computational imaging.
Abstract
Accurately capturing dynamic scenes with wide-ranging motion and light intensity is crucial for many vision applications. However, acquiring high-speed high dynamic range (HDR) video is challenging because the camera's frame rate restricts its dynamic range. Existing methods sacrifice speed to acquire multi-exposure frames. Yet, misaligned motion in these frames can still pose complications for HDR fusion algorithms, resulting in artifacts. Instead of frame-based exposures, we sample the videos using individual pixels at varying exposures and phase offsets. Implemented on a monochrome pixel-wise programmable image sensor, our sampling pattern simultaneously captures fast motion at a high dynamic range. We then transform pixel-wise outputs into an HDR video using end-to-end learned weights from deep neural networks, achieving high spatiotemporal resolution with minimized motion blurring. We demonstrate aliasing-free HDR video acquisition at 1000 FPS, resolving fast motion under low-light conditions and against bright backgrounds - both challenging conditions for conventional cameras. By combining the versatility of pixel-wise sampling patterns with the strength of deep neural networks at decoding complex scenes, our method greatly enhances the vision system's adaptability and performance in dynamic conditions.
