On the optimal pivot path of simplex method for linear programming based on reinforcement learning

Anqi Li; Tiande Guo; Congying Han; Bonan Li; Haoran Li

On the optimal pivot path of simplex method for linear programming based on reinforcement learning

Anqi Li, Tiande Guo, Congying Han, Bonan Li, Haoran Li

TL;DR

This paper proposes the optimal rule to find all the shortest pivot paths of the simplex method for linear programming problems based on Monte Carlo tree search and proves that when the number of vertices in the feasible region is Cnm, the method can generate all the shortest pivot paths.

Abstract

Based on the existing pivot rules, the simplex method for linear programming is not polynomial in the worst case. Therefore the optimal pivot of the simplex method is crucial. This study proposes the optimal rule to find all shortest pivot paths of the simplex method for linear programming problems based on Monte Carlo tree search (MCTS). Specifically, we first propose the SimplexPseudoTree to transfer the simplex method into tree search mode while avoiding repeated basis variables. Secondly, we propose four reinforcement learning (RL) models with two actions and two rewards to make the Monte Carlo tree search suitable for the simplex method. Thirdly, we set a new action selection criterion to ameliorate the inaccurate evaluation in the initial exploration. It is proved that when the number of vertices in the feasible region is $C_n^m$, our method can generate all the shortest pivot paths, which is the polynomial of the number of variables. In addition, we experimentally validate that the proposed schedule can avoid unnecessary search and provide the optimal pivot path. Furthermore, this method can provide the best pivot labels for all kinds of supervised learning methods to solve linear programming problems.

On the optimal pivot path of simplex method for linear programming based on reinforcement learning

TL;DR

Abstract

, our method can generate all the shortest pivot paths, which is the polynomial of the number of variables. In addition, we experimentally validate that the proposed schedule can avoid unnecessary search and provide the optimal pivot path. Furthermore, this method can provide the best pivot labels for all kinds of supervised learning methods to solve linear programming problems.

Paper Structure (36 sections, 1 theorem, 46 equations, 8 figures, 4 tables, 1 algorithm)

This paper contains 36 sections, 1 theorem, 46 equations, 8 figures, 4 tables, 1 algorithm.

Introduction
Background and Related Work
LP Problem
Simplex Method
Classical Pivot Rules
Pivot Rules Based on Machine Learning
Combinatorial Optimization Methods Based on MCTS
Constructed SimplexPseudoTree Model
Proposed RL Algorithm
RL Models of MCTS Rule
State
Two Action Sets
Two Reward Functions
MCTS Rule
Construction Stage
...and 21 more sections

Key Result

Corollary 5.1

When the number of vertices in the feasible region is $C_n^m$, the MCTS rule ensures that the number of pivot iterations becomes the polynomial of the number of variables.

Figures (8)

Figure 1: Overview of the methodological framework in this paper. Firstly, we create SimplexPseudoTree to transform the simplex method applicable to reinforcement methods in Section 3. Next, four RL models are proposed in Section 4.1 based on the SimplexPseudoTree. Then we propose the MCTS rule to calculate all the shortest pivot paths in Section 4.2 and Section 4.3. Finally, we give thorough theory analysis for the MCTS rule in Section 5.
Figure 2: Proposed SimplexPseudoTree for simplex method. The subgraph on the left shows the process of finding the optimal solution according to the simplex method. Middle subgraph is the SimplexPseudoTree corresponding to the instance. The subgraph on the right is the optimal pivot path found based on SimplexPseudoTree.
Figure 3: Algorithm flow diagram of the MCTS rule.
Figure 4: Model comparison figure on five representative instances: rand 50 $\times$ 50, rand 232 $\times$ 504, rand 332 $\times$ 187 and SC50A. The X-axis represents the explorations, which are multiples of columns of the constraint matrix $A$. The Y-axis of the left figure represents the average pivot iterations. And the Y-axis of the right figure represents the average solution time.
Figure 5: Multiple paths found vary with the number of algorithm executions. The X-axis represents the number of algorithm executions, and the Y-axis represents the different pivot paths currently found.
...and 3 more figures

Theorems & Definitions (6)

Definition 5.1: Rank Function
Definition 5.2: Significance Operator
proof
proof
proof
Corollary 5.1

On the optimal pivot path of simplex method for linear programming based on reinforcement learning

TL;DR

Abstract

On the optimal pivot path of simplex method for linear programming based on reinforcement learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (6)