Table of Contents
Fetching ...

In-Field Mapping of Grape Yield and Quality with Illumination-Invariant Deep Learning

Ciem Cornelissen, Sander De Coninck, Axel Willekens, Sam Leroux, Pieter Simoens

TL;DR

This work tackles the critical problem of illumination-induced domain shift in in-field hyperspectral sensing for grape yield and quality assessment. It introduces LISA, aLight-Invariant Spectral Autoencoder based on domain-adversarial learning to extract illumination-robust features directly from uncalibrated HSI data, improving cross-domain generalization for Brix and Acidity prediction. The authors validate LISA on a unique multi-domain dataset (Lab, Field-AM, Field-PM) and integrate it with a YOLOv11-Large based yield pipeline and georeferenced mapping to produce high-resolution yield and quality maps in the field. The end-to-end system demonstrates real-time, in-field operation with robust predictions, highlighting its potential to enable data-driven, precision viticulture while reducing reliance on impractical field calibrations.

Abstract

This paper presents an end-to-end, IoT-enabled robotic system for the non-destructive, real-time, and spatially-resolved mapping of grape yield and quality (Brix, Acidity) in vineyards. The system features a comprehensive analytical pipeline that integrates two key modules: a high-performance model for grape bunch detection and weight estimation, and a novel deep learning framework for quality assessment from hyperspectral (HSI) data. A critical barrier to in-field HSI is the ``domain shift" caused by variable illumination. To overcome this, our quality assessment is powered by the Light-Invariant Spectral Autoencoder (LISA), a domain-adversarial framework that learns illumination-invariant features from uncalibrated data. We validated the system's robustness on a purpose-built HSI dataset spanning three distinct illumination domains: controlled artificial lighting (lab), and variable natural sunlight captured in the morning and afternoon. Results show the complete pipeline achieves a recall (0.82) for bunch detection and a $R^2$ (0.76) for weight prediction, while the LISA module improves quality prediction generalization by over 20% compared to the baselines. By combining these robust modules, the system successfully generates high-resolution, georeferenced data of both grape yield and quality, providing actionable, data-driven insights for precision viticulture.

In-Field Mapping of Grape Yield and Quality with Illumination-Invariant Deep Learning

TL;DR

This work tackles the critical problem of illumination-induced domain shift in in-field hyperspectral sensing for grape yield and quality assessment. It introduces LISA, aLight-Invariant Spectral Autoencoder based on domain-adversarial learning to extract illumination-robust features directly from uncalibrated HSI data, improving cross-domain generalization for Brix and Acidity prediction. The authors validate LISA on a unique multi-domain dataset (Lab, Field-AM, Field-PM) and integrate it with a YOLOv11-Large based yield pipeline and georeferenced mapping to produce high-resolution yield and quality maps in the field. The end-to-end system demonstrates real-time, in-field operation with robust predictions, highlighting its potential to enable data-driven, precision viticulture while reducing reliance on impractical field calibrations.

Abstract

This paper presents an end-to-end, IoT-enabled robotic system for the non-destructive, real-time, and spatially-resolved mapping of grape yield and quality (Brix, Acidity) in vineyards. The system features a comprehensive analytical pipeline that integrates two key modules: a high-performance model for grape bunch detection and weight estimation, and a novel deep learning framework for quality assessment from hyperspectral (HSI) data. A critical barrier to in-field HSI is the ``domain shift" caused by variable illumination. To overcome this, our quality assessment is powered by the Light-Invariant Spectral Autoencoder (LISA), a domain-adversarial framework that learns illumination-invariant features from uncalibrated data. We validated the system's robustness on a purpose-built HSI dataset spanning three distinct illumination domains: controlled artificial lighting (lab), and variable natural sunlight captured in the morning and afternoon. Results show the complete pipeline achieves a recall (0.82) for bunch detection and a (0.76) for weight prediction, while the LISA module improves quality prediction generalization by over 20% compared to the baselines. By combining these robust modules, the system successfully generates high-resolution, georeferenced data of both grape yield and quality, providing actionable, data-driven insights for precision viticulture.

Paper Structure

This paper contains 35 sections, 2 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: The integrated IoT-enabled robotic platform during in-field data acquisition. Key components shown are a) the mobile robotic base, b) a collaborative robotic arm positioning the c) Specim FX10 hyperspectral camera, and d) the GPS antenna for georeferencing.
  • Figure 2: Visualization of the pseudo-RGB hyperspectral images and the spectra of different data subsets.
  • Figure 3: Overview of the integrated data processing workflow. The robotic platform captures a continuous, georeferenced hyperspectral scan. The system processes this data stream using a sliding window approach. For each window, a pseudo-RGB image is synthesized and passed to a YOLO-based object detector. A bunch tracker identifies unique bunches across consecutive overlapping windows. Once a bunch is fully captured within the frame, its full-spectrum HSI data is extracted and analyzed by two parallel deep learning modules for yield (weight) and quality (Brix/Acidity) prediction. The quality pipeline includes a dedicated preprocessing step before the data is fed into our LISA framework. All per-bunch predictions are georeferenced and aggregated to create spatial data for precision viticulture.
  • Figure 4: Architecture of the proposed Light-Invariant Spectral Autoencoder (LISA). Data from the Lab ($D_{Lab}$) and Field ($D_{Field}$) domains are fed as hyperspectral patches into a central Encoder. The Encoder learns a compressed latent representation ($z$), which serves as a shared feature space for four distinct objectives, each optimized by a corresponding loss function: (1) A Decoder uses $z$ to reconstruct the original input, guided by the reconstruction loss ($L_{recon}$), ensuring features are representative. (2) A Task Predictor regresses grape quality parameters from $z$, guided by the task loss ($L_{task}$). (3) A Domain Discriminator attempts to classify the origin of the data (Lab vs. Field) from $z$. The Encoder is simultaneously trained to "fool" this discriminator, an adversarial process governed by the domain loss ($L_{domain}$). This forces $z$ to become domain-invariant. (4) A manifold regularization loss ($L_{manifold}$) is applied to $z$ to promote a semantically meaningful latent space where similar samples are grouped together.
  • Figure 5: t-SNE visualization demonstrating the effectiveness of LISA in learning domain-invariant and task-relevant features. (Top Row: Domain Invariance) Points are colored by their acquisition domain (Lab, Field-AM, Field-PM). While the raw data (a) shows more distinct clustering based on lighting conditions, the learned latent space (b) shows that the three domains are more intermixed, confirming the model has learned features that are invariant to the domain shift. (Bottom Row: Task-Relevant Structure) Points are colored by their ground truth Brix value. The model transforms the unstructured distribution in the raw data (c) into a more continuous manifold in the latent space (d), where samples with similar Brix values are located near each other. This structured representation is ideal for robust regression.
  • ...and 1 more figures