Table of Contents
Fetching ...

LIME-Eval: Rethinking Low-light Image Enhancement Evaluation via Object Detection

Mingjia Li, Hao Zhao, Xiaojie Guo

TL;DR

This work critiques detector-retraining as an evaluation proxy for low-light image enhancement due to overfitting and domain shift. It introduces LIME-Bench, a public human-preference benchmark, and LIME-Eval, a label-free energy-based evaluator that uses detectors pre-trained on normal-light data to assess enhanced images without ground-truth or labels. The energy criterion $E(x_{cls},x_{bg})$ combines logits and objectness to produce a dataset-level score, showing strong correlations with detection performance ($r$ up to $0.881$, $\rho$ up to $0.847$) and with human preferences ($r\approx 0.593$). Additional experiments show that including the energy term as a training loss can improve both perceptual quality (LPIPS) and downstream detection metrics, and ablation confirms robustness to hyperparameters. The work provides a scalable, fairer framework for evaluating low-light enhancers and points to broader uses of energy-based evaluation in image processing.

Abstract

Due to the nature of enhancement--the absence of paired ground-truth information, high-level vision tasks have been recently employed to evaluate the performance of low-light image enhancement. A widely-used manner is to see how accurately an object detector trained on enhanced low-light images by different candidates can perform with respect to annotated semantic labels. In this paper, we first demonstrate that the mentioned approach is generally prone to overfitting, and thus diminishes its measurement reliability. In search of a proper evaluation metric, we propose LIME-Bench, the first online benchmark platform designed to collect human preferences for low-light enhancement, providing a valuable dataset for validating the correlation between human perception and automated evaluation metrics. We then customize LIME-Eval, a novel evaluation framework that utilizes detectors pre-trained on standard-lighting datasets without object annotations, to judge the quality of enhanced images. By adopting an energy-based strategy to assess the accuracy of output confidence maps, our LIME-Eval can simultaneously bypass biases associated with retraining detectors and circumvent the reliance on annotations for dim images. Comprehensive experiments are provided to reveal the effectiveness of our LIME-Eval. Our benchmark platform (https://huggingface.co/spaces/lime-j/eval) and code (https://github.com/lime-j/lime-eval) are available online.

LIME-Eval: Rethinking Low-light Image Enhancement Evaluation via Object Detection

TL;DR

This work critiques detector-retraining as an evaluation proxy for low-light image enhancement due to overfitting and domain shift. It introduces LIME-Bench, a public human-preference benchmark, and LIME-Eval, a label-free energy-based evaluator that uses detectors pre-trained on normal-light data to assess enhanced images without ground-truth or labels. The energy criterion combines logits and objectness to produce a dataset-level score, showing strong correlations with detection performance ( up to , up to ) and with human preferences (). Additional experiments show that including the energy term as a training loss can improve both perceptual quality (LPIPS) and downstream detection metrics, and ablation confirms robustness to hyperparameters. The work provides a scalable, fairer framework for evaluating low-light enhancers and points to broader uses of energy-based evaluation in image processing.

Abstract

Due to the nature of enhancement--the absence of paired ground-truth information, high-level vision tasks have been recently employed to evaluate the performance of low-light image enhancement. A widely-used manner is to see how accurately an object detector trained on enhanced low-light images by different candidates can perform with respect to annotated semantic labels. In this paper, we first demonstrate that the mentioned approach is generally prone to overfitting, and thus diminishes its measurement reliability. In search of a proper evaluation metric, we propose LIME-Bench, the first online benchmark platform designed to collect human preferences for low-light enhancement, providing a valuable dataset for validating the correlation between human perception and automated evaluation metrics. We then customize LIME-Eval, a novel evaluation framework that utilizes detectors pre-trained on standard-lighting datasets without object annotations, to judge the quality of enhanced images. By adopting an energy-based strategy to assess the accuracy of output confidence maps, our LIME-Eval can simultaneously bypass biases associated with retraining detectors and circumvent the reliance on annotations for dim images. Comprehensive experiments are provided to reveal the effectiveness of our LIME-Eval. Our benchmark platform (https://huggingface.co/spaces/lime-j/eval) and code (https://github.com/lime-j/lime-eval) are available online.

Paper Structure

This paper contains 15 sections, 6 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Qualitative comparisons on samples from the ExDark dataset. Please zoom in for more details. Due to the page limit, more cases can be found in the appendix.
  • Figure 2: User preference study. (a) plots the rank of the overall user preference (Elo Rating) in relation to detection performance (mAP). (b) depicts the Elo Ratings respective for noise/artifact reduction, illumination enhancement, color restoration, and boundary sharpness.
  • Figure 3: Correlations between user preference and popular IQA approaches.
  • Figure 4: The pipeline of our proposed LIME-Eval.
  • Figure 5: A visualization of synthetic setting. More details can be found in Appendix.
  • ...and 6 more figures