Table of Contents
Fetching ...

Kriging and Gaussian Process Interpolation for Georeferenced Data Augmentation

Frédérick Fabre Ferber, Dominique Gay, Jean-Christophe Soulié, Jean Diatta, Odalric-Ambrym Maillard

TL;DR

The paper tackles data augmentation for geo-referenced, data-scarce datasets by evaluating interpolation methods—Gaussian processes with multiple kernels and kriging with several variograms—to augment observations predicting weed cover (Commelina benghalensis L.) on Reunion Island sugarcane plots. It systematically compares predictive performance across multiple regression algorithms, analyzes how performance scales with added points, and assesses the spatial consistency of augmented data via density maps. The results show that multikernel GP augmentation (notably GP-COMB) generally delivers the strongest predictive gains and faster convergence, while kriging provides more homogeneous spatial coverage. These findings support applying GP-based geo-referenced augmentation to similar spatially structured, limited-data problems and point to future work on multi-label extensions and broader geographic datasets.

Abstract

Data augmentation is a crucial step in the development of robust supervised learning models, especially when dealing with limited datasets. This study explores interpolation techniques for the augmentation of geo-referenced data, with the aim of predicting the presence of Commelina benghalensis L. in sugarcane plots in La R{é}union. Given the spatial nature of the data and the high cost of data collection, we evaluated two interpolation approaches: Gaussian processes (GPs) with different kernels and kriging with various variograms. The objectives of this work are threefold: (i) to identify which interpolation methods offer the best predictive performance for various regression algorithms, (ii) to analyze the evolution of performance as a function of the number of observations added, and (iii) to assess the spatial consistency of augmented datasets. The results show that GP-based methods, in particular with combined kernels (GP-COMB), significantly improve the performance of regression algorithms while requiring less additional data. Although kriging shows slightly lower performance, it is distinguished by a more homogeneous spatial coverage, a potential advantage in certain contexts.

Kriging and Gaussian Process Interpolation for Georeferenced Data Augmentation

TL;DR

The paper tackles data augmentation for geo-referenced, data-scarce datasets by evaluating interpolation methods—Gaussian processes with multiple kernels and kriging with several variograms—to augment observations predicting weed cover (Commelina benghalensis L.) on Reunion Island sugarcane plots. It systematically compares predictive performance across multiple regression algorithms, analyzes how performance scales with added points, and assesses the spatial consistency of augmented data via density maps. The results show that multikernel GP augmentation (notably GP-COMB) generally delivers the strongest predictive gains and faster convergence, while kriging provides more homogeneous spatial coverage. These findings support applying GP-based geo-referenced augmentation to similar spatially structured, limited-data problems and point to future work on multi-label extensions and broader geographic datasets.

Abstract

Data augmentation is a crucial step in the development of robust supervised learning models, especially when dealing with limited datasets. This study explores interpolation techniques for the augmentation of geo-referenced data, with the aim of predicting the presence of Commelina benghalensis L. in sugarcane plots in La R{é}union. Given the spatial nature of the data and the high cost of data collection, we evaluated two interpolation approaches: Gaussian processes (GPs) with different kernels and kriging with various variograms. The objectives of this work are threefold: (i) to identify which interpolation methods offer the best predictive performance for various regression algorithms, (ii) to analyze the evolution of performance as a function of the number of observations added, and (iii) to assess the spatial consistency of augmented datasets. The results show that GP-based methods, in particular with combined kernels (GP-COMB), significantly improve the performance of regression algorithms while requiring less additional data. Although kriging shows slightly lower performance, it is distinguished by a more homogeneous spatial coverage, a potential advantage in certain contexts.
Paper Structure (14 sections, 9 equations, 3 figures, 4 tables)

This paper contains 14 sections, 9 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: MSE performance of the MLP algorithm as a function of the number of points added by different interpolation techniques. 0 points added corresponds to the original dataset.
  • Figure 2: Important areas of Reunion Island for the apparition of the species Commelina benghalensis L.
  • Figure 3: Density map for the species Commelina benghalensis L. (COMBE) for the base dataset (Base) and the datasets augmented by 5 interpolation methods: Comb (Combination of kernels), Linear (Linear kernel), RBF (RBF kernel), Exponential Co-Kriging (Exponential) and Spherical Co-Kriging (Spherical).