Table of Contents
Fetching ...

Optimizing Navigation And Chemical Application in Precision Agriculture With Deep Reinforcement Learning And Conditional Action Tree

Mahsa Khosravi, Zhanhong Jiang, Joshua R Waite, Sarah Jonesc, Hernan Torres, Arti Singh, Baskar Ganapathysubramanian, Asheesh Kumar Singh, Soumik Sarkar

TL;DR

This work addresses the challenge of jointly optimizing navigation and site-specific chemical spraying in precision agriculture under partial observability and energy constraints. It introduces HAM-PPO, a hierarchical action masking PPO framework that uses a Conditional Action Tree and Invalid Action Masking to efficiently decide when and where to scout or spray, while respecting field and battery limits. The method relies on domain-specific rewards tied to yield recovery and chemical usage and is evaluated in AgGym with static infection maps, showing superior yield recovery and reduced chemical costs compared with lawnmower-based baselines and random policies, and robustness to observation noise. The significance lies in delivering a scalable, resource-efficient planning approach for precision ag robotics that could inform real-world deployments and extension to dynamic infection scenarios and multi-agent coordination.

Abstract

This paper presents a novel reinforcement learning (RL)-based planning scheme for optimized robotic management of biotic stresses in precision agriculture. The framework employs a hierarchical decision-making structure with conditional action masking, where high-level actions direct the robot's exploration, while low-level actions optimize its navigation and efficient chemical spraying in affected areas. The key objectives of optimization include improving the coverage of infected areas with limited battery power and reducing chemical usage, thus preventing unnecessary spraying of healthy areas of the field. Our numerical experimental results demonstrate that the proposed method, Hierarchical Action Masking Proximal Policy Optimization (HAM-PPO), significantly outperforms baseline practices, such as LawnMower navigation + indiscriminate spraying (Carpet Spray), in terms of yield recovery and resource efficiency. HAM-PPO consistently achieves higher yield recovery percentages and lower chemical costs across a range of infection scenarios. The framework also exhibits robustness to observation noise and generalizability under diverse environmental conditions, adapting to varying infection ranges and spatial distribution patterns.

Optimizing Navigation And Chemical Application in Precision Agriculture With Deep Reinforcement Learning And Conditional Action Tree

TL;DR

This work addresses the challenge of jointly optimizing navigation and site-specific chemical spraying in precision agriculture under partial observability and energy constraints. It introduces HAM-PPO, a hierarchical action masking PPO framework that uses a Conditional Action Tree and Invalid Action Masking to efficiently decide when and where to scout or spray, while respecting field and battery limits. The method relies on domain-specific rewards tied to yield recovery and chemical usage and is evaluated in AgGym with static infection maps, showing superior yield recovery and reduced chemical costs compared with lawnmower-based baselines and random policies, and robustness to observation noise. The significance lies in delivering a scalable, resource-efficient planning approach for precision ag robotics that could inform real-world deployments and extension to dynamic infection scenarios and multi-agent coordination.

Abstract

This paper presents a novel reinforcement learning (RL)-based planning scheme for optimized robotic management of biotic stresses in precision agriculture. The framework employs a hierarchical decision-making structure with conditional action masking, where high-level actions direct the robot's exploration, while low-level actions optimize its navigation and efficient chemical spraying in affected areas. The key objectives of optimization include improving the coverage of infected areas with limited battery power and reducing chemical usage, thus preventing unnecessary spraying of healthy areas of the field. Our numerical experimental results demonstrate that the proposed method, Hierarchical Action Masking Proximal Policy Optimization (HAM-PPO), significantly outperforms baseline practices, such as LawnMower navigation + indiscriminate spraying (Carpet Spray), in terms of yield recovery and resource efficiency. HAM-PPO consistently achieves higher yield recovery percentages and lower chemical costs across a range of infection scenarios. The framework also exhibits robustness to observation noise and generalizability under diverse environmental conditions, adapting to varying infection ranges and spatial distribution patterns.

Paper Structure

This paper contains 31 sections, 22 equations, 9 figures, 4 tables, 2 algorithms.

Figures (9)

  • Figure 1: Degeneration from POMDP to MDP through state estimation. For the purpose of illustration, we skip the reward from the environment to the agent.
  • Figure 2: An action tree in our work consists of six possible actions and two components $B_0=2$ and $B_1=4\;or\;2$. The possible action is scout or deep scout in $B_0$, which also determines the action selected in $B_1$. Scout and deep scout cannot be performed at the same time.
  • Figure 3: A schematic diagram of CAT with the joint distribution of masks $h$ and components $b$. A component $b_0$ is sampled from the options allowed by the mask $h_0$. Subsequently, $h_1$ depends on the $b_0$, determining on the next possible component $b_1$, given the state $s$. Iteratively, after all components are sampled through the CAT, the action $a$ is determined.
  • Figure 4: Schematic diagram of the proposed RL framework for precision pest management. The framework follows an iterative RL loop, where the agent observes the field state, selects high-level, and corresponding low-level decisions. Observations, including agent position, visitation history, spraying history, and plot health status (collected via drones), are updated after each action using ground robot measurements. Rewards guide the agent’s behavior, enabling it to optimize pest management strategies effectively, even in the presence of noisy or incomplete data.
  • Figure 5: Each subplot evaluates a distinct performance metric: (a) Yield Recovery (%) measures the percentage of yield preserved from potential losses, (b) Yield Price ($) per Bushel per Acre reflects the economic value of recovered yield per unit area, and (c) Pesticide Cost ($) quantifies the expenses associated with pesticide application.
  • ...and 4 more figures