Table of Contents
Fetching ...

JPDS-NN: Reinforcement Learning-Based Dynamic Task Allocation for Agricultural Vehicle Routing Optimization

Yixuan Fan, Haotian Xu, Mengqiao Liu, Qing Zhuo, Tao Zhang

TL;DR

This work tackles the Entrance Dependent Vehicle Routing Problem (EDVRP) in agriculture, where field geometry and entrance locations critically affect routing. It introduces JPDS-NN, an encoder–decoder network with graph transformers and attention that models routing as a Markov Decision Process and is trained with Proximal Policy Optimization to optimize distance, time, and fuel. Key contributions include a graph-transformer–based input encoder with a pre-training task, a generated-sequence GRU encoder, a joint node-entrance action sampler, and comprehensive ablations plus dynamic rearrangement experiments. Empirical results show JPDS-NN achieves substantial reductions in travel distance (48.4–65.4%), fuel usage (14.0–17.6%), and two-order-of-magnitude faster computation than baselines, with 15–25% gains in dynamic scenarios, indicating strong practical value for scalable, intelligent agricultural routing. The approach also demonstrates the importance of cross-attention and pre-training for robust performance in complex, dynamic field environments.

Abstract

The Entrance Dependent Vehicle Routing Problem (EDVRP) is a variant of the Vehicle Routing Problem (VRP) where the scale of cities influences routing outcomes, necessitating consideration of their entrances. This paper addresses EDVRP in agriculture, focusing on multi-parameter vehicle planning for irregularly shaped fields. To address the limitations of traditional methods, such as heuristic approaches, which often overlook field geometry and entrance constraints, we propose a Joint Probability Distribution Sampling Neural Network (JPDS-NN) to effectively solve the EDVRP. The network uses an encoder-decoder architecture with graph transformers and attention mechanisms to model routing as a Markov Decision Process, and is trained via reinforcement learning for efficient and rapid end-to-end planning. Experimental results indicate that JPDS-NN reduces travel distances by 48.4-65.4%, lowers fuel consumption by 14.0-17.6%, and computes two orders of magnitude faster than baseline methods, while demonstrating 15-25% superior performance in dynamic arrangement scenarios. Ablation studies validate the necessity of cross-attention and pre-training. The framework enables scalable, intelligent routing for large-scale farming under dynamic constraints.

JPDS-NN: Reinforcement Learning-Based Dynamic Task Allocation for Agricultural Vehicle Routing Optimization

TL;DR

This work tackles the Entrance Dependent Vehicle Routing Problem (EDVRP) in agriculture, where field geometry and entrance locations critically affect routing. It introduces JPDS-NN, an encoder–decoder network with graph transformers and attention that models routing as a Markov Decision Process and is trained with Proximal Policy Optimization to optimize distance, time, and fuel. Key contributions include a graph-transformer–based input encoder with a pre-training task, a generated-sequence GRU encoder, a joint node-entrance action sampler, and comprehensive ablations plus dynamic rearrangement experiments. Empirical results show JPDS-NN achieves substantial reductions in travel distance (48.4–65.4%), fuel usage (14.0–17.6%), and two-order-of-magnitude faster computation than baselines, with 15–25% gains in dynamic scenarios, indicating strong practical value for scalable, intelligent agricultural routing. The approach also demonstrates the importance of cross-attention and pre-training for robust performance in complex, dynamic field environments.

Abstract

The Entrance Dependent Vehicle Routing Problem (EDVRP) is a variant of the Vehicle Routing Problem (VRP) where the scale of cities influences routing outcomes, necessitating consideration of their entrances. This paper addresses EDVRP in agriculture, focusing on multi-parameter vehicle planning for irregularly shaped fields. To address the limitations of traditional methods, such as heuristic approaches, which often overlook field geometry and entrance constraints, we propose a Joint Probability Distribution Sampling Neural Network (JPDS-NN) to effectively solve the EDVRP. The network uses an encoder-decoder architecture with graph transformers and attention mechanisms to model routing as a Markov Decision Process, and is trained via reinforcement learning for efficient and rapid end-to-end planning. Experimental results indicate that JPDS-NN reduces travel distances by 48.4-65.4%, lowers fuel consumption by 14.0-17.6%, and computes two orders of magnitude faster than baseline methods, while demonstrating 15-25% superior performance in dynamic arrangement scenarios. Ablation studies validate the necessity of cross-attention and pre-training. The framework enables scalable, intelligent routing for large-scale farming under dynamic constraints.

Paper Structure

This paper contains 18 sections, 8 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: The EDVRP scenario in farms. Top left: Real-world field with multiple plots. Bottom: Agricultural vehicles operate along working lines. Top right: Dashed lines represent working lines; white/black points represent entrances; solid lines represent the roads. Vehicles must start from the designated starting point, complete tasks without any redundancy, and then return to the ending point. The starting and ending points can either be the same location (such as a depot) or different locations (within working lines or roads).
  • Figure 2: The input encoder handles the task graph and vehicle features. The decoder, comprising a sequence encoder, actor network, and critic network, produces a sequence. In the MDP, the environment includes inputs, input encoder, and sequence encoder, with the actor network as the agent. The input encoder extracts high-dimensional input features, and the sequence encoder processes action features. At each step, the actor network chooses an action $a = P_{A,t}$ based on the state, determining the next node and its entrance.
  • Figure 3: Detailed structure of our networks. In the figure, MLP refers to Multi-Layer Perceptron; Attn refers to Attention; MHA refers to Multi-Head AttentionRN119; LN refers to Layer Normalization; Concat refers to vector concatenation; GRU refers to Gated Recurrent Unitgru; and Read Out concatenates the bitwise maximum and average values of a sequence, using MLP to produce a global feature.
  • Figure 4: Training curves of JPDS-NNs under four random seeds. In the early stages, the optimization of distance, time, and fuel consumption aligns, but with training processes, these objectives may diverge or conflict. In the later stages, as the algorithm provides better allocation, the optimization directions converge again.
  • Figure 5: The network without pre-training converges as quickly or even faster in the early training stages. However, as training progresses, the pre-trained network shows less fluctuations and a better performance on the validation set.
  • ...and 2 more figures