A Benchmark for Maximum Cut: Towards Standardization of the Evaluation of Learned Heuristics for Combinatorial Optimization

Ankur Nath; Alan Kuhnle

A Benchmark for Maximum Cut: Towards Standardization of the Evaluation of Learned Heuristics for Combinatorial Optimization

Ankur Nath, Alan Kuhnle

TL;DR

This paper introduces MaxCut-Bench, an open-source benchmark for evaluating learned heuristics on the Maximum Cut problem, using a diverse mix of real-world and synthetic instances. It systematically compares traditional heuristics, quantum annealing approaches, and GNN-based methods, finding that simple baselines like Tabu Search often outperform learned methods across many distributions, while certain learned approaches (notably ANYCSP) can rival state-of-the-art baselines but at higher computational cost. The study emphasizes baseline and instance standardization, reveals limits in generalization for learned heuristics, and provides a critical assessment of where deep learning adds value. Overall, MaxCut-Bench offers a rigorous, extensible platform to guide fair comparisons and future research in ML for combinatorial optimization.

Abstract

Recently, there has been much work on the design of general heuristics for graph-based, combinatorial optimization problems via the incorporation of Graph Neural Networks (GNNs) to learn distribution-specific solution structures.However, there is a lack of consistency in the evaluation of these heuristics, in terms of the baselines and instances chosen, which makes it difficult to assess the relative performance of the algorithms. In this paper, we propose an open-source benchmark suite MaxCut-Bench dedicated to the NP-hard Maximum Cut problem in both its weighted and unweighted variants, based on a careful selection of instances curated from diverse graph datasets. The suite offers a unified interface to various heuristics, both traditional and machine learning-based. Next, we use the benchmark in an attempt to systematically corroborate or reproduce the results of several, popular learning-based approaches, including S2V-DQN [31], ECO-DQN [4], among others, in terms of three dimensions: objective value, generalization, and scalability. Our empirical results show that several of the learned heuristics fail to outperform a naive greedy algorithm, and that only one of them consistently outperforms Tabu Search, a simple, general heuristic based upon local search. Furthermore, we find that the performance of ECO-DQN remains the same or is improved if the GNN is replaced by a simple linear regression on a subset of the features that are related to Tabu Search. Code, data, and pretrained models are available at: \url{https://github.com/ankurnath/MaxCut-Bench}.

A Benchmark for Maximum Cut: Towards Standardization of the Evaluation of Learned Heuristics for Combinatorial Optimization

TL;DR

Abstract

Paper Structure (28 sections, 7 figures, 8 tables)

This paper contains 28 sections, 7 figures, 8 tables.

Introduction
Related Work
The MaxCut-Bench Benchmark
Benchmark Datasets
Benchmark Algorithms
Evaluation
Does deep learning really improve the performance of a traditional heuristic?
Have deep learning heuristics obtained any absolute improvement over the best traditional heuristic?
Generalization: Do learned heuristics generalizes well on unseen distributions?
Efficiency and scalability analysis
Conclusion and Future Directions
Appendix
Detailed Description of Datasets
Detailed Description of Benchmark Algorithms
Baseline and Instance Bias
...and 13 more sections

Figures (7)

Figure 1: Violin plots of objective values of the learned and their classical counterparts on a selection of weighted instances.
Figure 2: Violin plots of objective values of the learned and classical heuristics on a selection of unweighted instances.
Figure 3: Generalisation of agents to unseen graph sizes and structures.
Figure 4: Comparison of the wall-clock time and average GPU and CPU memory utilization among heuristics.
Figure 5: Researchers often select arbitrary baselines, which, when combined with instance bias, can lead to confusion in empirical evaluations.
...and 2 more figures

A Benchmark for Maximum Cut: Towards Standardization of the Evaluation of Learned Heuristics for Combinatorial Optimization

TL;DR

Abstract

A Benchmark for Maximum Cut: Towards Standardization of the Evaluation of Learned Heuristics for Combinatorial Optimization

Authors

TL;DR

Abstract

Table of Contents

Figures (7)