A Comprehensive Evaluation of Contemporary ML-Based Solvers for Combinatorial Optimization
Shengyu Feng, Weiwei Sun, Shanda Li, Ameet Talwalkar, Yiming Yang
TL;DR
FrontierCO introduces a comprehensive, realistic benchmark for evaluating contemporary ML-based solvers on eight CO problems using large-scale, industry-inspired instances and standardized training data. The study systematically compares 16 ML solvers (neural and LLM-based) against state-of-the-art human-designed solvers under a fixed time budget, revealing a persistent performance gap, especially on hard instances, with neural solvers showing scalability limits and LLM agents displaying high variability. Ablation and analysis show neural modules help on weaker baselines, while LLMs tend to rediscover known metaheuristics rather than invent new strategies, suggesting hybrid neural-symbolic approaches as a promising direction. By providing standardized BKS, training data, and a unified evaluation framework, FrontierCO offers a reproducible platform that guides robust advancement in ML for combinatorial optimization and informs practical deployment in real-world settings.
Abstract
Machine learning (ML) has demonstrated considerable potential in supporting model design and optimization for combinatorial optimization (CO) problems. However, much of the progress to date has been evaluated on small-scale, synthetic datasets, raising concerns about the practical effectiveness of ML-based solvers in real-world, large-scale CO scenarios. Additionally, many existing CO benchmarks lack sufficient training data, limiting their utility for evaluating data-driven approaches. To address these limitations, we introduce FrontierCO, a comprehensive benchmark that covers eight canonical CO problem types and evaluates 16 representative ML-based solvers--including graph neural networks and large language model (LLM) agents. FrontierCO features challenging instances drawn from industrial applications and frontier CO research, offering both realistic problem difficulty and abundant training data. Our empirical results provide critical insights into the strengths and limitations of current ML methods, helping to guide more robust and practically relevant advances at the intersection of machine learning and combinatorial optimization. Our data is available at https://huggingface.co/datasets/CO-Bench/FrontierCO.
