Table of Contents
Fetching ...

Tensorized Ant Colony Optimization for GPU Acceleration

Luming Yang, Tao Jiang, Ran Cheng

TL;DR

This work tackles the CPU-limited scalability of Ant Colony Optimization (ACO) for large-scale TSP by introducing TensorACO, a GPU-accelerated framework that tensorizes both the ant system and ant path. Key components include preprocessing to compute a probability transition matrix $\mathbf{M}_p$, tensorized representations of ant movement and path updates with index-based aggregation, and AdaIR for parallel, adaptive city selection. The authors demonstrate substantial speedups up to $1921\times$ over CPU ACO and improved convergence with AdaIR (up to $80\%$ faster and $2\%$ better solution quality), validated on TSPLIB instances within a unified EvoX/EvoX environment. These results highlight TensorACO's potential to enable scalable, high-performance ACO for very large TSP instances and motivate applying tensorized GPU approaches to other combinatorial optimization problems.

Abstract

Ant Colony Optimization (ACO) is renowned for its effectiveness in solving Traveling Salesman Problems, yet it faces computational challenges in CPU-based environments, particularly with large-scale instances. In response, we introduce a Tensorized Ant Colony Optimization (TensorACO) to utilize the advancements of GPU acceleration. As the core, TensorACO fully transforms ant system and ant path into tensor forms, a process we refer to as tensorization. For the tensorization of ant system, we propose a preprocessing method to reduce the computational overhead by calculating the probability transition matrix. In the tensorization of ant path, we propose an index mapping method to accelerate the update of pheromone matrix by replacing the mechanism of sequential path update with parallel matrix operations. Additionally, we introduce an Adaptive Independent Roulette (AdaIR) method to overcome the challenges of parallelizing ACO's selection mechanism on GPUs. Comprehensive experiments demonstrate the superior performance of TensorACO achieving up to 1921$\times$ speedup over standard ACO. Moreover, the AdaIR method further improves TensorACO's convergence speed by 80% and solution quality by 2%. Source codes are available at https://github.com/EMI-Group/tensoraco.

Tensorized Ant Colony Optimization for GPU Acceleration

TL;DR

This work tackles the CPU-limited scalability of Ant Colony Optimization (ACO) for large-scale TSP by introducing TensorACO, a GPU-accelerated framework that tensorizes both the ant system and ant path. Key components include preprocessing to compute a probability transition matrix , tensorized representations of ant movement and path updates with index-based aggregation, and AdaIR for parallel, adaptive city selection. The authors demonstrate substantial speedups up to over CPU ACO and improved convergence with AdaIR (up to faster and better solution quality), validated on TSPLIB instances within a unified EvoX/EvoX environment. These results highlight TensorACO's potential to enable scalable, high-performance ACO for very large TSP instances and motivate applying tensorized GPU approaches to other combinatorial optimization problems.

Abstract

Ant Colony Optimization (ACO) is renowned for its effectiveness in solving Traveling Salesman Problems, yet it faces computational challenges in CPU-based environments, particularly with large-scale instances. In response, we introduce a Tensorized Ant Colony Optimization (TensorACO) to utilize the advancements of GPU acceleration. As the core, TensorACO fully transforms ant system and ant path into tensor forms, a process we refer to as tensorization. For the tensorization of ant system, we propose a preprocessing method to reduce the computational overhead by calculating the probability transition matrix. In the tensorization of ant path, we propose an index mapping method to accelerate the update of pheromone matrix by replacing the mechanism of sequential path update with parallel matrix operations. Additionally, we introduce an Adaptive Independent Roulette (AdaIR) method to overcome the challenges of parallelizing ACO's selection mechanism on GPUs. Comprehensive experiments demonstrate the superior performance of TensorACO achieving up to 1921 speedup over standard ACO. Moreover, the AdaIR method further improves TensorACO's convergence speed by 80% and solution quality by 2%. Source codes are available at https://github.com/EMI-Group/tensoraco.
Paper Structure (14 sections, 5 equations, 6 figures)

This paper contains 14 sections, 5 equations, 6 figures.

Figures (6)

  • Figure 1: Schematic overview of TensorACO. The workflow comprises two main components: ant system tensorization and ant path tensorization. The matrix of squares represents the pheromone matrix or probability transition matrix, and the color depth represents the value. The key represents $\kappa$. Multiple arrows represent the function mapping, and multiple overlay sets pointed by the arrows can be parallelized on device.
  • Figure 2: Runtime over scaling population size and scaling city size. Note: No data is available at the city scale of 2392 as the runtime of CPU-ACO exceeded the tolerance range.
  • Figure 3: Solution error for CPU-TensorACO and GPU-TensorACO with varying population sizes. (a) and (b): Convergence curve over runtime; (c): Final quality obtained by two models for 70s. On GPUs, a larger population size helps to achieve better results.
  • Figure 4: Relationship between RW probability $p_\text{max}$ and shifted probability $\hat{p}_{\text{max}}'$ using AdaIR. The gradient color represents the variation across iterations. $\hat{p}_{\text{max}}$ and $\hat{p}_{\text{max}}'$ tend to align within the first few iterations. (c): As $\gamma$ varies, the convex hull alters, guiding the directional shift in ants' selection tendencies.
  • Figure 5: Runtime for RW, IR, and AdaIR over varying city scales. AdaIR achieves speedup ranging from 2.12$\times$ to 4.62$\times$ against RW, with only a slight time difference from IR.
  • ...and 1 more figures