Table of Contents
Fetching ...

BVI-RLV: A Fully Registered Dataset and Benchmarks for Low-Light Video Enhancement

Ruirui Lin, Nantheera Anantrasirichai, Guoxi Huang, Joanne Lin, Qi Sun, Alexandra Malyugina, David R Bull

TL;DR

This work tackles the data scarcity and temporal inconsistency challenges in low-light video enhancement by introducing BVI-RLV, a fully registered LL video dataset with ground-truth normal-light frames produced via histogram-based registration and a programmable motion dolly. It provides 40 dynamic/static scenes with over 31,800 registered frame pairs across two low-light levels and a full-light reference, plus four benchmarking models (PCDUNet, STA-SUNet, BVI-CDM, BVI-Mamba) designed to run on a single GPU. Experiments demonstrate that training on fully registered pairs yields superior LLVE performance compared to existing datasets, with BVI-Mamba delivering the strongest cross-dataset gains. The dataset and benchmarks enable rigorous supervised LLVE research and practical evaluation of temporal coherence, with public access to the data and baselines to spur further development.

Abstract

Low-light videos often exhibit spatiotemporal incoherent noise, compromising visibility and performance in computer vision applications. One significant challenge in enhancing such content using deep learning is the scarcity of training data. This paper introduces a novel low-light video dataset, consisting of 40 scenes with various motion scenarios under two distinct low-lighting conditions, incorporating genuine noise and temporal artifacts. We provide fully registered ground truth data captured in normal light using a programmable motorized dolly and refine it via an image-based approach for pixel-wise frame alignment across different light levels. We provide benchmarks based on four different technologies: convolutional neural networks, transformers, diffusion models, and state space models (mamba). Our experimental results demonstrate the significance of fully registered video pairs for low-light video enhancement (LLVE) and the comprehensive evaluation shows that the models trained with our dataset outperform those trained with the existing datasets. Our dataset and links to benchmarks are publicly available at https://doi.org/10.21227/mzny-8c77.

BVI-RLV: A Fully Registered Dataset and Benchmarks for Low-Light Video Enhancement

TL;DR

This work tackles the data scarcity and temporal inconsistency challenges in low-light video enhancement by introducing BVI-RLV, a fully registered LL video dataset with ground-truth normal-light frames produced via histogram-based registration and a programmable motion dolly. It provides 40 dynamic/static scenes with over 31,800 registered frame pairs across two low-light levels and a full-light reference, plus four benchmarking models (PCDUNet, STA-SUNet, BVI-CDM, BVI-Mamba) designed to run on a single GPU. Experiments demonstrate that training on fully registered pairs yields superior LLVE performance compared to existing datasets, with BVI-Mamba delivering the strongest cross-dataset gains. The dataset and benchmarks enable rigorous supervised LLVE research and practical evaluation of temporal coherence, with public access to the data and baselines to spur further development.

Abstract

Low-light videos often exhibit spatiotemporal incoherent noise, compromising visibility and performance in computer vision applications. One significant challenge in enhancing such content using deep learning is the scarcity of training data. This paper introduces a novel low-light video dataset, consisting of 40 scenes with various motion scenarios under two distinct low-lighting conditions, incorporating genuine noise and temporal artifacts. We provide fully registered ground truth data captured in normal light using a programmable motorized dolly and refine it via an image-based approach for pixel-wise frame alignment across different light levels. We provide benchmarks based on four different technologies: convolutional neural networks, transformers, diffusion models, and state space models (mamba). Our experimental results demonstrate the significance of fully registered video pairs for low-light video enhancement (LLVE) and the comprehensive evaluation shows that the models trained with our dataset outperform those trained with the existing datasets. Our dataset and links to benchmarks are publicly available at https://doi.org/10.21227/mzny-8c77.
Paper Structure (8 sections, 6 figures, 5 tables)

This paper contains 8 sections, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Scene examples with varying light levels and different motion profiles as shown in x-t plane. The length of the red lines across the normal-light frames represents the height of the x-t planes.
  • Figure 3: (Left) Scene setting showing the camera in 'angle' position, mounted on CineDrive system. (Right) Moving bunny scene with static background showing pixel value difference between the normal-light frame and the adjusted low-light frame before and after alignment (gray = zero error).
  • Figure 4: Cropped images (Lego and Kitchen scenes at 350$\times$350 and 140$\times$140 pixels, respectively) with histogram matching to the reference (normal light) to visualize noise at different levels of light.
  • Figure 5: Main architectural components of the four different benchmarking methods, i.e., PCDUNet, STA-SUNet, BVI-Mamba and BVI-CDM.
  • Figure 6: Subjective results of the BVI-CDM model trained on different datasets, and tested on different datasets. The test results for the DRV, SDSD, and DID datasets are displayed from top to down. The two bottom rows show the test results on our datasets at light levels of 10% and 20%
  • ...and 1 more figures