Table of Contents
Fetching ...

A Comparative Study of Deep Reinforcement Learning for Crop Production Management

Joseph Balderas, Dong Chen, Yanbo Huang, Li Wang, Ren-Cang Li

TL;DR

Comparing PPO and DQN against static baseline policies across three different RL tasks, fertilization, irrigation, and mixed management, provided by the gym-DSSAT environment indicates that PPO outperforms DQN in fertilization and irrigation tasks, while DQN excels in the mixed management task.

Abstract

Crop production management is essential for optimizing yield and minimizing a field's environmental impact to crop fields, yet it remains challenging due to the complex and stochastic processes involved. Recently, researchers have turned to machine learning to address these complexities. Specifically, reinforcement learning (RL), a cutting-edge approach designed to learn optimal decision-making strategies through trial and error in dynamic environments, has emerged as a promising tool for developing adaptive crop management policies. RL models aim to optimize long-term rewards by continuously interacting with the environment, making them well-suited for tackling the uncertainties and variability inherent in crop management. Studies have shown that RL can generate crop management policies that compete with, and even outperform, expert-designed policies within simulation-based crop models. In the gym-DSSAT crop model environment, one of the most widely used simulators for crop management, proximal policy optimization (PPO) and deep Q-networks (DQN) have shown promising results. However, these methods have not yet been systematically evaluated under identical conditions. In this study, we evaluated PPO and DQN against static baseline policies across three different RL tasks, fertilization, irrigation, and mixed management, provided by the gym-DSSAT environment. To ensure a fair comparison, we used consistent default parameters, identical reward functions, and the same environment settings. Our results indicate that PPO outperforms DQN in fertilization and irrigation tasks, while DQN excels in the mixed management task. This comparative analysis provides critical insights into the strengths and limitations of each approach, advancing the development of more effective RL-based crop management strategies.

A Comparative Study of Deep Reinforcement Learning for Crop Production Management

TL;DR

Comparing PPO and DQN against static baseline policies across three different RL tasks, fertilization, irrigation, and mixed management, provided by the gym-DSSAT environment indicates that PPO outperforms DQN in fertilization and irrigation tasks, while DQN excels in the mixed management task.

Abstract

Crop production management is essential for optimizing yield and minimizing a field's environmental impact to crop fields, yet it remains challenging due to the complex and stochastic processes involved. Recently, researchers have turned to machine learning to address these complexities. Specifically, reinforcement learning (RL), a cutting-edge approach designed to learn optimal decision-making strategies through trial and error in dynamic environments, has emerged as a promising tool for developing adaptive crop management policies. RL models aim to optimize long-term rewards by continuously interacting with the environment, making them well-suited for tackling the uncertainties and variability inherent in crop management. Studies have shown that RL can generate crop management policies that compete with, and even outperform, expert-designed policies within simulation-based crop models. In the gym-DSSAT crop model environment, one of the most widely used simulators for crop management, proximal policy optimization (PPO) and deep Q-networks (DQN) have shown promising results. However, these methods have not yet been systematically evaluated under identical conditions. In this study, we evaluated PPO and DQN against static baseline policies across three different RL tasks, fertilization, irrigation, and mixed management, provided by the gym-DSSAT environment. To ensure a fair comparison, we used consistent default parameters, identical reward functions, and the same environment settings. Our results indicate that PPO outperforms DQN in fertilization and irrigation tasks, while DQN excels in the mixed management task. This comparative analysis provides critical insights into the strengths and limitations of each approach, advancing the development of more effective RL-based crop management strategies.

Paper Structure

This paper contains 17 sections, 14 equations, 7 figures, 1 table, 2 algorithms.

Figures (7)

  • Figure 1: In the RL process, an agent makes an action in an environment, and the environment in turn produces a new state and a reward which informs the agent of its current performance. The goal of the agent is to use the environment feedback to maximize its cumulative rewards. This loop repeats up to a specified number of iterations or until a terminal state is reached gautron2022reinforcement.
  • Figure 2: PPO and DQN training curves for the fertilization problem. The horizontal axis measures training iterations and the vertical axis measures cumulative rewards.
  • Figure 3: PPO and DQN training curves for the irrigation problem. The horizontal axis measures training iterations and the vertical axis measures cumulative rewards.
  • Figure 4: PPO and DQN training curves for the mixed problem. The horizontal axis measures training iterations and the vertical axis measures cumulative rewards.
  • Figure 5: Evaluation results shown as box plots. The vertical axis measures cumulative rewards for the 1000 test episodes.
  • ...and 2 more figures