Rethinking the Evaluation of Visible and Infrared Image Fusion
Dayan Guan, Yixuan Wu, Tianzhu Liu, Alex C. Kot, Yanfeng Gu
TL;DR
This work tackles the evaluation bottleneck of Visible-Infrared Image Fusion (VIF) by introducing a Segmentation-oriented Evaluation Approach (SEA) that uses universal segmentation models to assess fused images without ground-truth references. SEA fuses inputs, predicts semantic segmentation with models like X-Decoder, SEEM, and G-SAM, and computes mean IoU ($mIoU$) against labeled segmentation maps, enabling cross-dataset and cross-method comparisons. In extensive experiments on the FMB and MVSeg datasets, SEA shows that many recent VIF methods do not outperform simply using visible images, even when infrared data appears informativeness-rich; the study also identifies $Q_{ ext{ABF}}$ and $Q_{ ext{VIFF}}$ as the conventional metrics most correlated with SEA, offering practical proxies when labels are unavailable. The results call for a reorientation of VIF research toward semantically consistent fusion and provide a scalable, dataset-agnostic evaluation framework to guide future method development. The work contributes (1) a universally applicable SEA framework, (2) a comprehensive comparative study of 30 recent VIF methods across two large datasets, and (3) a correlation analysis linking SEA to traditional metrics to inform proxy-based evaluation.
Abstract
Visible and Infrared Image Fusion (VIF) has garnered significant interest across a wide range of high-level vision tasks, such as object detection and semantic segmentation. However, the evaluation of VIF methods remains challenging due to the absence of ground truth. This paper proposes a Segmentation-oriented Evaluation Approach (SEA) to assess VIF methods by incorporating the semantic segmentation task and leveraging segmentation labels available in latest VIF datasets. Specifically, SEA utilizes universal segmentation models, capable of handling diverse images and classes, to predict segmentation outputs from fused images and compare these outputs with segmentation labels. Our evaluation of recent VIF methods using SEA reveals that their performance is comparable or even inferior to using visible images only, despite nearly half of the infrared images demonstrating better performance than visible images. Further analysis indicates that the two metrics most correlated to our SEA are the gradient-based fusion metric $Q_{\text{ABF}}$ and the visual information fidelity metric $Q_{\text{VIFF}}$ in conventional VIF evaluation metrics, which can serve as proxies when segmentation labels are unavailable. We hope that our evaluation will guide the development of novel and practical VIF methods. The code has been released in \url{https://github.com/Yixuan-2002/SEA/}.
