Table of Contents
Fetching ...

TAXI: Traveling Salesman Problem Accelerator with X-bar-based Ising Macros Powered by SOT-MRAMs and Hierarchical Clustering

Sangmin Yoo, Amod Holla, Sourav Sanyal, Dong Eun Kim, Francesca Iacopi, Dwaipayan Biswas, James Myers, Kaushik Roy

TL;DR

TAXI tackles the scalability gap in Ising-based TSP solvers by combining in-memory Xbar Ising macros powered by SOT-MRAM RNGs with a hierarchical clustering strategy that decomposes large TSPs into parallel subproblems. The architecture features a MAC-based energy minimization of the Ising Hamiltonian and a stochastic decision mechanism, enabling fast, energy-efficient annealing directly in memory. Key contributions include a novel W_D distance mapping, a dedicated Ising macro with superposition, distance calculation, stochastic vectors, and an annealing schedule, plus a hierarchical clustering framework with fixed inter-cluster routes and aggressive parallelism mapped to Xbar hardware. Evaluation shows TAXI achieving up to 8× speedups over prior clustering-Ising solvers across 20 TSPLib benchmarks (up to 85,900 cities) with competitive solution quality close to Concorde, demonstrating the practicality of hardware-algorithm co-design for large-scale combinatorial optimization.

Abstract

Ising solvers with hierarchical clustering have shown promise for large-scale Traveling Salesman Problems (TSPs), in terms of latency and energy. However, most of these methods still face unacceptable quality degradation as the problem size increases beyond a certain extent. Additionally, their hardware-agnostic adoptions limit their ability to fully exploit available hardware resources. In this work, we introduce TAXI -- an in-memory computing-based TSP accelerator with crossbar(Xbar)-based Ising macros. Each macro independently solves a TSP sub-problem, obtained by hierarchical clustering, without the need for any off-macro data movement, leading to massive parallelism. Within the macro, Spin-Orbit-Torque (SOT) devices serve as compact energy-efficient random number generators enabling rapid "natural annealing". By leveraging hardware-algorithm co-design, TAXI offers improvements in solution quality, speed, and energy-efficiency on TSPs up to 85,900 cities (the largest TSPLIB instance). TAXI produces solutions that are only 22% and 20% longer than the Concorde solver's exact solution on 33,810 and 85,900 city TSPs, respectively. TAXI outperforms a current state-of-the-art clustering-based Ising solver, being 8x faster on average across 20 benchmark problems from TSPLib.

TAXI: Traveling Salesman Problem Accelerator with X-bar-based Ising Macros Powered by SOT-MRAMs and Hierarchical Clustering

TL;DR

TAXI tackles the scalability gap in Ising-based TSP solvers by combining in-memory Xbar Ising macros powered by SOT-MRAM RNGs with a hierarchical clustering strategy that decomposes large TSPs into parallel subproblems. The architecture features a MAC-based energy minimization of the Ising Hamiltonian and a stochastic decision mechanism, enabling fast, energy-efficient annealing directly in memory. Key contributions include a novel W_D distance mapping, a dedicated Ising macro with superposition, distance calculation, stochastic vectors, and an annealing schedule, plus a hierarchical clustering framework with fixed inter-cluster routes and aggressive parallelism mapped to Xbar hardware. Evaluation shows TAXI achieving up to 8× speedups over prior clustering-Ising solvers across 20 TSPLib benchmarks (up to 85,900 cities) with competitive solution quality close to Concorde, demonstrating the practicality of hardware-algorithm co-design for large-scale combinatorial optimization.

Abstract

Ising solvers with hierarchical clustering have shown promise for large-scale Traveling Salesman Problems (TSPs), in terms of latency and energy. However, most of these methods still face unacceptable quality degradation as the problem size increases beyond a certain extent. Additionally, their hardware-agnostic adoptions limit their ability to fully exploit available hardware resources. In this work, we introduce TAXI -- an in-memory computing-based TSP accelerator with crossbar(Xbar)-based Ising macros. Each macro independently solves a TSP sub-problem, obtained by hierarchical clustering, without the need for any off-macro data movement, leading to massive parallelism. Within the macro, Spin-Orbit-Torque (SOT) devices serve as compact energy-efficient random number generators enabling rapid "natural annealing". By leveraging hardware-algorithm co-design, TAXI offers improvements in solution quality, speed, and energy-efficiency on TSPs up to 85,900 cities (the largest TSPLIB instance). TAXI produces solutions that are only 22% and 20% longer than the Concorde solver's exact solution on 33,810 and 85,900 city TSPs, respectively. TAXI outperforms a current state-of-the-art clustering-based Ising solver, being 8x faster on average across 20 benchmark problems from TSPLib.

Paper Structure

This paper contains 25 sections, 5 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Hierarchical clustering and Ising macros for large-scale traveling salesman problems. The shortest route within each cluster (red in the left) is optimized by an Ising macro (each grey box in the right). The final solution is derived by merging inter-cluster (blue in the left) and intra-cluster routes.
  • Figure 2: Non-monotonic energy search space. Energy minimization (yellow) and Stochastic update (green) jointly find the global minima by enabling descending the energy landscape and escaping from local minimas, respectively.
  • Figure 3: (a) TSP with 4 cities. $u-x$ represent distances and numbers in a circle denote visiting orders. (b) Distance matrix mapped on a crossbar. Distances reformulated to $W_{D}$ are programmed to crosspoints in resistance.
  • Figure 4: Floor plan of an Ising macro. Arrows represent data flow across the macro. Insets (a-d) illustrate components in the macro. (a) Superposition of vectors to optimization. Red and blue arrows represent up- and down-spin stored in spin storage, respectively. (b) Optimization by crossbar array as the distance matrix and following peripheral circuits. (c) SOT-MRAMs in the stochastic circuit pass or prevent current from the crossbar following its stochastic switching. Inset displays the probability of switching $R_{AP}$ to $R_{P}$ of the chosen SOT device IMEC-SOT-MRAM2022. $I_{\text{stoch}}$ and $I_{\text{det}}$ denote the current ranges for stochastic and deterministic operations, respectively. (d) ArgMax circuit picks a city to visit by choosing the largest current.
  • Figure 5: (a) Optimal ratio depending on the maximum cluster size in 4-bit precision. (b) Solution quality degradation when the bit precision changes from 4-bit to lower-bit options. The positive number represents degradation. (c) Comparison of solution optimality. Data of TAXI with 4-bit precision and 12 cluster size are presented. Data of other Ising solvers are adapted from neuro-isingTSP-shimengTSP-shimeng2.
  • ...and 1 more figures