Spectral Image Data Fusion for Multisource Data Augmentation
Roberta Iuliana Luca, Alexandra Baicoianu, Ioana Cristina Plajer
TL;DR
The paper addresses the difficulty of training on spectral datasets with heterogeneous signatures and resolutions by proposing a fusion pipeline that interpolates multisource spectra to a common reference grid. It uses four interpolation methods and a Pavia University reference to harmonize wavelengths, evaluating fidelity with CMSE and NDVI alongside downstream semantic segmentation using FCNN and UNet. The study demonstrates that direct spectral alignment is feasible across six datasets and that fused data can support robust segmentation, with linear, quadratic, cubic, and PCHIP methods offering dataset-dependent trade-offs. Overall, the approach provides a practical preprocessing step for spectral data augmentation, enabling broader cross-source generalization and potential improvements in real-world remote sensing and hyperspectral analysis.
Abstract
Multispectral and hyperspectral images are increasingly popular in different research fields, such as remote sensing, astronomical imaging, or precision agriculture. However, the amount of free data available to perform machine learning tasks is relatively small. Moreover, artificial intelligence models developed in the area of spectral imaging require input images with a fixed spectral signature, expecting the data to have the same number of spectral bands or the same spectral resolution. This requirement significantly reduces the number of usable sources that can be used for a given model. The scope of this study is to introduce a methodology for spectral image data fusion, in order to allow machine learning models to be trained and/or used on data from a larger number of sources, thus providing better generalization. For this purpose, we propose different interpolation techniques, in order to make multisource spectral data compatible with each other. The interpolation outcomes are evaluated through various approaches. This includes direct assessments using surface plots and metrics such as a Custom Mean Squared Error (CMSE) and the Normalized Difference Vegetation Index (NDVI). Additionally, indirect evaluation is done by estimating their impact on machine learning model training, particularly for semantic segmentation.
