A Reinforcement-Learning-Based Multiple-Column Selection Strategy for Column Generation
Haofeng Yuan, Lichang Fang, Shiji Song
TL;DR
This work tackles slow convergence in column generation for large-scale LPs by introducing a reinforcement-learning-based strategy to select multiple columns per CG iteration. It formulates the CG process as an MDP, employing an actor-critic network with a GIN-based encoder and a complete-graph enhanced actor to learn nontrivial, column-relation-aware multi-column selections under PPO. Empirical results on cutting stock and graph coloring problems show the RL approach reduces the number of iterations and runtime compared to both single-column and existing multi-column baselines, and generalizes to larger instances. The work advances practical CG performance and offers a framework that can be extended to incorporate additional acceleration techniques.
Abstract
Column generation (CG) is one of the most successful approaches for solving large-scale linear programming (LP) problems. Given an LP with a prohibitively large number of variables (i.e., columns), the idea of CG is to explicitly consider only a subset of columns and iteratively add potential columns to improve the objective value. While adding the column with the most negative reduced cost can guarantee the convergence of CG, it has been shown that adding multiple columns per iteration rather than a single column can lead to faster convergence. However, it remains a challenge to design a multiple-column selection strategy to select the most promising columns from a large number of candidate columns. In this paper, we propose a novel reinforcement-learning-based (RL) multiple-column selection strategy. To the best of our knowledge, it is the first RL-based multiple-column selection strategy for CG. The effectiveness of our approach is evaluated on two sets of problems: the cutting stock problem and the graph coloring problem. Compared to several widely used single-column and multiple-column selection strategies, our RL-based multiple-column selection strategy leads to faster convergence and achieves remarkable reductions in the number of CG iterations and runtime.
