Evaluating LLMs for Combinatorial Optimization: One-Phase and Two-Phase Heuristics for 2D Bin-Packing
Syed Mahbubul Huq, Daniel Brito, Daniel Sikar, Chris Child, Tillman Weyde, Rajesh Mojumder
TL;DR
This work tackles the NP-hard $2$D$-$BPP by introducing an evaluation framework in which an LLM (GPT-4o) generates and iteratively refines packing heuristics within an evolutionary prompting loop. It benchmarks the LLM-driven approach against traditional baselines (FFF and HFF) on fixed-size bins ($200\times100$) with 50 items, demonstrating that LLMs can reduce the average number of bins from 16 to 15 and increase space utilization to 0.83, with rapid convergence within two iterations. The study provides a rigorous, multi-metric methodology for assessing LLM capabilities in specialized optimization tasks and establishes benchmarks for future AI-assisted combinatorial optimization work. The results suggest that prompt-driven learning, constraint feedback, and diversity-preserving island strategies enable LLMs to discover and refine effective heuristics with competitive runtimes, signaling practical impact for industrial packing and related domains.
Abstract
This paper presents an evaluation framework for assessing Large Language Models' (LLMs) capabilities in combinatorial optimization, specifically addressing the 2D bin-packing problem. We introduce a systematic methodology that combines LLMs with evolutionary algorithms to generate and refine heuristic solutions iteratively. Through comprehensive experiments comparing LLM generated heuristics against traditional approaches (Finite First-Fit and Hybrid First-Fit), we demonstrate that LLMs can produce more efficient solutions while requiring fewer computational resources. Our evaluation reveals that GPT-4o achieves optimal solutions within two iterations, reducing average bin usage from 16 to 15 bins while improving space utilization from 0.76-0.78 to 0.83. This work contributes to understanding LLM evaluation in specialized domains and establishes benchmarks for assessing LLM performance in combinatorial optimization tasks.
