Table of Contents
Fetching ...

Enhancing Strawberry Yield Forecasting with Backcasted IoT Sensor Data and Machine Learning

Tewodros Alemu Ayall, Andy Li, Matthew Beddows, Milan Markovic, Georgios Leontidis

TL;DR

This work tackles data-scarce strawberry yield forecasting in polytunnel agriculture by deploying IoT sensors and using a backcasting framework to generate synthetic sensor data from nearby Met Office weather information. An ML pipeline with RF, GBDT, and XGBoost is trained on a mix of real and synthetic data, demonstrating improved forecast accuracy over models trained on real data alone. The key contributions include real-world IoT deployment, a backcasting method to mengotiate data gaps, and empirical evidence that synthetic data boosts predictive performance, especially when environmental features are included. Practically, the approach offers a data-efficient path for farms to deploy AI-driven yield forecasting with limited multi-season sensor deployments, with potential for transfer to other crops and locations.

Abstract

Due to rapid population growth globally, digitally-enabled agricultural sectors are crucial for sustainable food production and making informed decisions about resource management for farmers and various stakeholders. The deployment of Internet of Things (IoT) technologies that collect real-time observations of various environmental (e.g., temperature, humidity, etc.) and operational factors (e.g., irrigation) influencing production is often seen as a critical step to enable additional novel downstream tasks, such as AI-based yield forecasting. However, since AI models require large amounts of data, this creates practical challenges in a real-world dynamic farm setting where IoT observations would need to be collected over a number of seasons. In this study, we deployed IoT sensors in strawberry production polytunnels for two growing seasons to collect environmental data, including water usage, external and internal temperature, external and internal humidity, soil moisture, soil temperature, and photosynthetically active radiation. The sensor observations were combined with manually provided yield records spanning a period of four seasons. To bridge the gap of missing IoT observations for two additional seasons, we propose an AI-based backcasting approach to generate synthetic sensor observations using historical weather data from a nearby weather station and the existing polytunnel observations. We built an AI-based yield forecasting model to evaluate our approach using the combination of real and synthetic observations. Our results demonstrated that incorporating synthetic data improved yield forecasting accuracy, with models incorporating synthetic data outperforming those trained only on historical yield, weather records, and real sensor data.

Enhancing Strawberry Yield Forecasting with Backcasted IoT Sensor Data and Machine Learning

TL;DR

This work tackles data-scarce strawberry yield forecasting in polytunnel agriculture by deploying IoT sensors and using a backcasting framework to generate synthetic sensor data from nearby Met Office weather information. An ML pipeline with RF, GBDT, and XGBoost is trained on a mix of real and synthetic data, demonstrating improved forecast accuracy over models trained on real data alone. The key contributions include real-world IoT deployment, a backcasting method to mengotiate data gaps, and empirical evidence that synthetic data boosts predictive performance, especially when environmental features are included. Practically, the approach offers a data-efficient path for farms to deploy AI-driven yield forecasting with limited multi-season sensor deployments, with potential for transfer to other crops and locations.

Abstract

Due to rapid population growth globally, digitally-enabled agricultural sectors are crucial for sustainable food production and making informed decisions about resource management for farmers and various stakeholders. The deployment of Internet of Things (IoT) technologies that collect real-time observations of various environmental (e.g., temperature, humidity, etc.) and operational factors (e.g., irrigation) influencing production is often seen as a critical step to enable additional novel downstream tasks, such as AI-based yield forecasting. However, since AI models require large amounts of data, this creates practical challenges in a real-world dynamic farm setting where IoT observations would need to be collected over a number of seasons. In this study, we deployed IoT sensors in strawberry production polytunnels for two growing seasons to collect environmental data, including water usage, external and internal temperature, external and internal humidity, soil moisture, soil temperature, and photosynthetically active radiation. The sensor observations were combined with manually provided yield records spanning a period of four seasons. To bridge the gap of missing IoT observations for two additional seasons, we propose an AI-based backcasting approach to generate synthetic sensor observations using historical weather data from a nearby weather station and the existing polytunnel observations. We built an AI-based yield forecasting model to evaluate our approach using the combination of real and synthetic observations. Our results demonstrated that incorporating synthetic data improved yield forecasting accuracy, with models incorporating synthetic data outperforming those trained only on historical yield, weather records, and real sensor data.

Paper Structure

This paper contains 21 sections, 4 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Example of our sensor deployment.
  • Figure 2: Strawberry yields for Multispan and Seaton polytunnels.
  • Figure 3: A Schematic of our experimental setup and computational pipeline.
  • Figure 4: Pearson correlation analysis for Multispan and Seaton polytunnels.
  • Figure 5: Water usage synthetic data generation for Multispan and Seaton polytunnels.
  • ...and 6 more figures