Optimizing Navigation And Chemical Application in Precision Agriculture With Deep Reinforcement Learning And Conditional Action Tree
Mahsa Khosravi, Zhanhong Jiang, Joshua R Waite, Sarah Jonesc, Hernan Torres, Arti Singh, Baskar Ganapathysubramanian, Asheesh Kumar Singh, Soumik Sarkar
TL;DR
This work addresses the challenge of jointly optimizing navigation and site-specific chemical spraying in precision agriculture under partial observability and energy constraints. It introduces HAM-PPO, a hierarchical action masking PPO framework that uses a Conditional Action Tree and Invalid Action Masking to efficiently decide when and where to scout or spray, while respecting field and battery limits. The method relies on domain-specific rewards tied to yield recovery and chemical usage and is evaluated in AgGym with static infection maps, showing superior yield recovery and reduced chemical costs compared with lawnmower-based baselines and random policies, and robustness to observation noise. The significance lies in delivering a scalable, resource-efficient planning approach for precision ag robotics that could inform real-world deployments and extension to dynamic infection scenarios and multi-agent coordination.
Abstract
This paper presents a novel reinforcement learning (RL)-based planning scheme for optimized robotic management of biotic stresses in precision agriculture. The framework employs a hierarchical decision-making structure with conditional action masking, where high-level actions direct the robot's exploration, while low-level actions optimize its navigation and efficient chemical spraying in affected areas. The key objectives of optimization include improving the coverage of infected areas with limited battery power and reducing chemical usage, thus preventing unnecessary spraying of healthy areas of the field. Our numerical experimental results demonstrate that the proposed method, Hierarchical Action Masking Proximal Policy Optimization (HAM-PPO), significantly outperforms baseline practices, such as LawnMower navigation + indiscriminate spraying (Carpet Spray), in terms of yield recovery and resource efficiency. HAM-PPO consistently achieves higher yield recovery percentages and lower chemical costs across a range of infection scenarios. The framework also exhibits robustness to observation noise and generalizability under diverse environmental conditions, adapting to varying infection ranges and spatial distribution patterns.
