Table of Contents
Fetching ...

Lifelong Learning with Behavior Consolidation for Vehicle Routing

Jiyuan Pei, Yi Mei, Jialin Liu, Mengjie Zhang, Xin Yao

TL;DR

This work tackles lifelong learning for neural VRP solvers across sequential tasks with changing distributions and scales, addressing catastrophic forgetting. It introduces Lifelong Learning Router with Behavior Consolidation (LLR-BC), which buffers past experiences and uses Confidence-aware Experience Weighting (CaEW) and Decision-seeking Behavior Consolidation (DsBC) to align new-task behavior with past performance via a consolidation objective. The approach is model-agnostic and validated on CVRP and TSP across multiple task orders, showing reduced forgetting, preserved plasticity, and improved zero-shot generalization compared to baselines. The results highlight the practical viability of lifelong learning in neural VRP solvers and point to future extensions to broader routing variants and continuously evolving task streams.

Abstract

Recent neural solvers have demonstrated promising performance in learning to solve routing problems. However, existing studies are primarily based on one-off training on one or a set of predefined problem distributions and scales, i.e., tasks. When a new task arises, they typically rely on either zero-shot generalization, which may be poor due to the discrepancies between the new task and the training task(s), or fine-tuning the pretrained solver on the new task, which possibly leads to catastrophic forgetting of knowledge acquired from previous tasks. This paper explores a novel lifelong learning paradigm for neural VRP solvers, where multiple tasks with diverse distributions and scales arise sequentially over time. Solvers are required to effectively and efficiently learn to solve new tasks while maintaining their performance on previously learned tasks. Consequently, a novel framework called Lifelong Learning Router with Behavior Consolidation (LLR-BC) is proposed. LLR-BC consolidates prior knowledge effectively by aligning behaviors of the solver trained on a new task with the buffered ones in a decision-seeking way. To encourage more focus on crucial experiences, LLR-BC assigns greater consolidated weights to decisions with lower confidence. Extensive experiments on capacitated vehicle routing problems and traveling salesman problems demonstrate LLR-BC's effectiveness in training high-performance neural solvers in a lifelong learning setting, addressing the catastrophic forgetting issue, maintaining their plasticity, and improving zero-shot generalization ability.

Lifelong Learning with Behavior Consolidation for Vehicle Routing

TL;DR

This work tackles lifelong learning for neural VRP solvers across sequential tasks with changing distributions and scales, addressing catastrophic forgetting. It introduces Lifelong Learning Router with Behavior Consolidation (LLR-BC), which buffers past experiences and uses Confidence-aware Experience Weighting (CaEW) and Decision-seeking Behavior Consolidation (DsBC) to align new-task behavior with past performance via a consolidation objective. The approach is model-agnostic and validated on CVRP and TSP across multiple task orders, showing reduced forgetting, preserved plasticity, and improved zero-shot generalization compared to baselines. The results highlight the practical viability of lifelong learning in neural VRP solvers and point to future extensions to broader routing variants and continuously evolving task streams.

Abstract

Recent neural solvers have demonstrated promising performance in learning to solve routing problems. However, existing studies are primarily based on one-off training on one or a set of predefined problem distributions and scales, i.e., tasks. When a new task arises, they typically rely on either zero-shot generalization, which may be poor due to the discrepancies between the new task and the training task(s), or fine-tuning the pretrained solver on the new task, which possibly leads to catastrophic forgetting of knowledge acquired from previous tasks. This paper explores a novel lifelong learning paradigm for neural VRP solvers, where multiple tasks with diverse distributions and scales arise sequentially over time. Solvers are required to effectively and efficiently learn to solve new tasks while maintaining their performance on previously learned tasks. Consequently, a novel framework called Lifelong Learning Router with Behavior Consolidation (LLR-BC) is proposed. LLR-BC consolidates prior knowledge effectively by aligning behaviors of the solver trained on a new task with the buffered ones in a decision-seeking way. To encourage more focus on crucial experiences, LLR-BC assigns greater consolidated weights to decisions with lower confidence. Extensive experiments on capacitated vehicle routing problems and traveling salesman problems demonstrate LLR-BC's effectiveness in training high-performance neural solvers in a lifelong learning setting, addressing the catastrophic forgetting issue, maintaining their plasticity, and improving zero-shot generalization ability.

Paper Structure

This paper contains 51 sections, 4 equations, 15 figures, 10 tables, 2 algorithms.

Figures (15)

  • Figure 1: Conceptual demonstration of catastrophic forgetting while fine-tuning on sequential, new tasks with different distributions and scales.
  • Figure 2: LLR-BC in the lifelong learning scenario where new tasks with different distributions and scales sequentially arise over time. $T_t$: the task at time $t$. $\pi_\theta$: solver with model parameters $\theta$. $\mathcal{B}$: experience buffer. $\mathcal{J}$: generated problem instances. $\{\tau\}$: problem solving trajectories. $\mathcal{E}$: experiences sampled from the buffer. $w_\mathcal{E}$: weights of experiences. $a$: action from action space $\mathcal{A}$, i.e., node to visit. $s$: state. var: variance. $\mathcal{P}$: probability distribution over actions.
  • Figure 3: Forgetting curve of task order 1, measured by average solution distance (vertical axis). Epochs 0–200 (first task) are omitted as no forgetting occurs. Notably, some methods obtain too large solution distances and exceed the vertical range.
  • Figure 4: Test performance on the current task during lifelong learning on task order 1.
  • Figure 5: Example of decision drift in a node with low confidence. The number on a node indicates its order in the solution. Left: the generated solution on a problem of task U by a given solver. Right: the top three values of action probability from the solver corresponding to the generated solution. Upper: a solver trained on U task. Lower: fine-tuning the solver on task E for 10 epochs after training it on task U.
  • ...and 10 more figures