Table of Contents
Fetching ...

Probing Neural TSP Representations for Prescriptive Decision Support

Reuben Narad, Léonard Boussioux, Michael Wagner

TL;DR

This work is the first to study neural TSP solvers as transferable encoders for prescriptive what-if decision-support objectives beyond tour construction, and shows that transfer accuracy increases with solver quality across training and model scale.

Abstract

The field of neural combinatorial optimization (NCO) trains neural policies to solve NP-hard problems such as the traveling salesperson problem (TSP). We ask whether, beyond producing good tours, a trained TSP solver learns internal representations that transfer to other optimization-relevant objectives, in the spirit of transfer learning from other domains. We train several attention-based TSP policies, collect their internal activations, and train probes on node/edge embeddings for two NP-hard prescriptive downstream tasks inspired by real-world logistics scenarios: node-removal sensitivity (identifying the most impactful node to remove) and edge-forbid sensitivity (identifying the most critical edge to retain). On a Euclidean TSP100-trained model, probes for both tasks are competitive with existing baselines. Ensembling probe signals with geometric features outperforms the strongest baselines: 65\% top-1 accuracy (vs. 58\% baseline) for the best-node-removal task, and 73\% top-1 accuracy (vs. 67\% baseline) for the worst-edge identification task. To our knowledge, we are the first to study neural TSP solvers as transferable encoders for prescriptive what-if decision-support objectives beyond tour construction. Finally, we show that transfer accuracy increases with solver quality across training and model scale, suggesting that training stronger NCO solvers also yields more useful encoders for downstream objectives. Our code is available at: github.com/ReubenNarad/tsp_prescriptive_probe

Probing Neural TSP Representations for Prescriptive Decision Support

TL;DR

This work is the first to study neural TSP solvers as transferable encoders for prescriptive what-if decision-support objectives beyond tour construction, and shows that transfer accuracy increases with solver quality across training and model scale.

Abstract

The field of neural combinatorial optimization (NCO) trains neural policies to solve NP-hard problems such as the traveling salesperson problem (TSP). We ask whether, beyond producing good tours, a trained TSP solver learns internal representations that transfer to other optimization-relevant objectives, in the spirit of transfer learning from other domains. We train several attention-based TSP policies, collect their internal activations, and train probes on node/edge embeddings for two NP-hard prescriptive downstream tasks inspired by real-world logistics scenarios: node-removal sensitivity (identifying the most impactful node to remove) and edge-forbid sensitivity (identifying the most critical edge to retain). On a Euclidean TSP100-trained model, probes for both tasks are competitive with existing baselines. Ensembling probe signals with geometric features outperforms the strongest baselines: 65\% top-1 accuracy (vs. 58\% baseline) for the best-node-removal task, and 73\% top-1 accuracy (vs. 67\% baseline) for the worst-edge identification task. To our knowledge, we are the first to study neural TSP solvers as transferable encoders for prescriptive what-if decision-support objectives beyond tour construction. Finally, we show that transfer accuracy increases with solver quality across training and model scale, suggesting that training stronger NCO solvers also yields more useful encoders for downstream objectives. Our code is available at: github.com/ReubenNarad/tsp_prescriptive_probe
Paper Structure (51 sections, 7 equations, 5 figures, 3 tables)

This paper contains 51 sections, 7 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Overview of the probing pipeline. We train a neural TSP solver, extract frozen encoder node embeddings from a single forward pass, and generate sensitivity labels by repeatedly re-solving to optimality under small interventions (node-removal or tour-edge-forbiddance). We then train probes on cached representations to predict which node to remove or which tour edge is most critical to retain.
  • Figure 2: Sensitivity task examples on TSP100. Top: Base optimal tour with nodes colored by removal impact (left), and resulting tour after removing the best node (right, circled). Bottom: Base tour with edges colored by forbid impact (left), and resulting tour after forbidding the most critical edge (right, circled). Colors encode percent change in optimal tour length. Sensitivity depends on global structure, not local geometry.
  • Figure 3: Training dynamics across model sizes. Top: TSP policy validation % suboptimality (greedy decoding) on a fixed TSP100 validation set. Bottom: probe top-1 accuracy across training checkpoints for node-removal and edge-forbid tasks, shown for linear and transformer probes trained on frozen encoder representations extracted at each checkpoint.
  • Figure 4: AttentionModel architecture (encoder/decoder). We cache the final encoder output (encoder_output) as frozen node representations for probing; the decoder is not used for representation extraction.
  • Figure 5: Layer-wise probeability across training for a linear probe (top-1 accuracy). Curves correspond to encoder block outputs (encoder_layer_0--encoder_layer_4) and the final encoder output (encoder_output).