Table of Contents
Fetching ...

Complexity Scaling Laws for Neural Models using Combinatorial Optimization

Lowell Weissman, Michael Krumdick, A. Lynn Abbott

TL;DR

This work introduces complexity-based neural scaling laws by decoupling problem difficulty into solution-space size and representation-space size, using the Traveling Salesman Problem as a testbed. It demonstrates that suboptimality scales smoothly with model size and compute for both RL and SFT, and provides interpretability through connections to local-search dynamics. The study presents two infinite-compute scaling laws: one for node-count-driven solution-space growth showing superlinear suboptimality, and one for dimension-driven representation-space growth showing convergence toward an asymptote, with distinct implications for embedding bottlenecks. Collectively, the results offer a framework to forecast performance under resource constraints, motivate comparisons across algorithms (RL vs SFT), and point toward extending complexity-scaling concepts to broader combinatorial and real-world domains.

Abstract

Recent work on neural scaling laws demonstrates that model performance scales predictably with compute budget, model size, and dataset size. In this work, we develop scaling laws based on problem complexity. We analyze two fundamental complexity measures: solution space size and representation space size. Using the Traveling Salesman Problem (TSP) as a case study, we show that combinatorial optimization promotes smooth cost trends, and therefore meaningful scaling laws can be obtained even in the absence of an interpretable loss. We then show that suboptimality grows predictably for fixed-size models when scaling the number of TSP nodes or spatial dimensions, independent of whether the model was trained with reinforcement learning or supervised fine-tuning on a static dataset. We conclude with an analogy to problem complexity scaling in local search, showing that a much simpler gradient descent of the cost landscape produces similar trends.

Complexity Scaling Laws for Neural Models using Combinatorial Optimization

TL;DR

This work introduces complexity-based neural scaling laws by decoupling problem difficulty into solution-space size and representation-space size, using the Traveling Salesman Problem as a testbed. It demonstrates that suboptimality scales smoothly with model size and compute for both RL and SFT, and provides interpretability through connections to local-search dynamics. The study presents two infinite-compute scaling laws: one for node-count-driven solution-space growth showing superlinear suboptimality, and one for dimension-driven representation-space growth showing convergence toward an asymptote, with distinct implications for embedding bottlenecks. Collectively, the results offer a framework to forecast performance under resource constraints, motivate comparisons across algorithms (RL vs SFT), and point toward extending complexity-scaling concepts to broader combinatorial and real-world domains.

Abstract

Recent work on neural scaling laws demonstrates that model performance scales predictably with compute budget, model size, and dataset size. In this work, we develop scaling laws based on problem complexity. We analyze two fundamental complexity measures: solution space size and representation space size. Using the Traveling Salesman Problem (TSP) as a case study, we show that combinatorial optimization promotes smooth cost trends, and therefore meaningful scaling laws can be obtained even in the absence of an interpretable loss. We then show that suboptimality grows predictably for fixed-size models when scaling the number of TSP nodes or spatial dimensions, independent of whether the model was trained with reinforcement learning or supervised fine-tuning on a static dataset. We conclude with an analogy to problem complexity scaling in local search, showing that a much simpler gradient descent of the cost landscape produces similar trends.

Paper Structure

This paper contains 53 sections, 6 theorems, 57 equations, 20 figures, 9 tables.

Key Result

Lemma 1

Expected random tour length grows linearly w.r.t. $n$.

Figures (20)

  • Figure 1: Suboptimality gap, defined as the difference between mean model performance and mean optimal performance, smoothly power decays with respect to model size and compute for both reinforcement learning (RL) and supervised fine-tuning (SFT) in TSP. Fits suggest that SFT is more compute-efficient than RL, and possibly more parameter-efficient as we scale to larger models, with faster decay toward optimal performance (larger $\alpha$). Top left: Suboptimality w.r.t. parameters ($N$) for models evaluated near convergence. Right: Suboptimality w.r.t. compute ($C$) evaluated throughout training, where the compute-efficient frontier power decays. Note that the compute axis for SFT has been stretched for easier viewing. Bottom left: Optimal model size follows power growth w.r.t. compute budget. This relationship is strikingly consistent between domains hoffmann2022traininghenighan2020scalinghilton2023scaling.
  • Figure 2: TSP has two convenient ways to adjust problem complexity: node count and spatial dimensionality. Left: 2D TSP instance with 5 nodes and a trivial 12 solutions. The solution tour sampled from a trained RL model is the optimal, minimum-length tour. Center: 2D TSP instance with 40 nodes and roughly $10^{46}$ solutions. The RL model tour is slightly suboptimal, 0.08 units longer than the optimal tour. Right: 3D TSP instance with 10 nodes, where brightness illustrates increased depth. Adding spatial dimensions does not modify the number of solutions but makes the problem representation more complex. This RL model solution is 0.05 units suboptimal.
  • Figure 3: Suboptimality over problem scaling for models near convergence with a fixed number of non-embedding parameters. Left: Suboptimality follows superlinear power growth w.r.t. nodes, though we expect this trend eventually to break down before intersecting the near-linear random performance ceiling (Figure \ref{['fig:bounds']}). Right: Suboptimality smoothly increases w.r.t. spatial dimensions, closely following negative exponential decay. Power growth (dashed), power decay (dash-dot), and exponential decay (solid) all predict the 10-node RL experiment (we show the latter). But power growth fails to find a convincing fit for its 20-node counterpart. Even random tour suboptimality is bounded as $d \to \infty$ (Theorem \ref{['proof:constspan']}), so any better-than-random monotonic trend must converge. But the power decay asymptote obtained for 10 nodes is larger than that for 20 nodes, which is nonsensical. Exponential decay is most predictive while maintaining sound $\beta_{\psi}$ asymptote ordering, as shown.
  • Figure 4: Node and spatial dimension scaling have distinct effects on the achievable performance span, the suboptimality gap of random performance. Left: Mean optimal tour length closely follows sublinear power growth w.r.t. either problem scale. Mean random tour length grows linearly w.r.t. nodes, and sublinearly w.r.t. dimensions at rate similar to optimal tour length growth. Each sublinear trend approaches square root growth in the limit ($\alpha = 0.5$; proof in Appendix \ref{['apx:proofs']}). Right: Suboptimality of random performance w.r.t. nodes is polynomial but approximately linear, being dominated by random tour length growth. Suboptimality of random performance w.r.t. dimensions produces a small, transient increase then decrease, but is provably constant in the limit (Theorem \ref{['proof:constspan']}).
  • Figure 5: 2-opt local search suboptimality over problem complexity scaling. A simpler gradient descent of the cost landscape can produce trends similar to those of parameter-constrained deep models. Top left: 2-opt suboptimality w.r.t. spatial dimensions closely aligns with RL trends. Pure exponential fit attempts decay slightly too fast, but we obtain close fits with the subexponential generalization shown, where $\phi \in (0,1]$. Top center: 2-opt suboptimality w.r.t. number of nodes. With unconstrained search depth, 2-opt produces an unclear trend with an inflection point. Top right: Contraining search depth ($M$) produces smooth superlinear growth. Bottom: Power growth emerges after saturating at 100% early stopping, aligning with the scaling form of parameter-constrained deep models (but these trends are not roughly equivalent, because proportionalities are quite different).
  • ...and 15 more figures

Theorems & Definitions (12)

  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • Theorem 4
  • proof
  • Theorem 5
  • proof
  • ...and 2 more