Table of Contents
Fetching ...

torchsom: The Reference PyTorch Library for Self-Organizing Maps

Louis Berthier, Ahmed Shokry, Maxime Moreaud, Guillaume Ramelet, Eric Moulines

TL;DR

torchsom delivers a GPU-accelerated, PyTorch-based SOM with a scikit-learn–style API and a rich visualization toolkit, addressing gaps in existing Python implementations. The paper details a modular architecture (core SOM engine, utilities, and visualization) and a comprehensive suite of learning, distance, neighborhood, and scheduling options, including multiple decays to ensure convergence. Empirical results show torchsom maintains fidelity comparable to MiniSom while achieving large speedups (77%–99% faster), with GPU execution providing the greatest gains, and improved topological preservation. The work additionally integrates clustering (K-means, GMM, HDBSCAN) and extensive visualization (U-matrix, component planes, reliability scores, clustering diagnostics) to support interpretable, production-ready unsupervised analysis. Overall, torchsom enables scalable, reproducible SOM workflows within PyTorch environments, bridging research and industrial use cases.

Abstract

This paper introduces torchsom, an open-source Python library that provides a reference implementation of the Self-Organizing Map (SOM) in PyTorch. This package offers three main features: (i) dimensionality reduction, (ii) clustering, and (iii) friendly data visualization. It relies on a PyTorch backend, enabling (i) fast and efficient training of SOMs through GPU acceleration, and (ii) easy and scalable integrations with PyTorch ecosystem. Moreover, torchsom follows the scikit-learn API for ease of use and extensibility. The library is released under the Apache 2.0 license with 90% test coverage, and its source code and documentation are available at https://github.com/michelin/TorchSOM.

torchsom: The Reference PyTorch Library for Self-Organizing Maps

TL;DR

torchsom delivers a GPU-accelerated, PyTorch-based SOM with a scikit-learn–style API and a rich visualization toolkit, addressing gaps in existing Python implementations. The paper details a modular architecture (core SOM engine, utilities, and visualization) and a comprehensive suite of learning, distance, neighborhood, and scheduling options, including multiple decays to ensure convergence. Empirical results show torchsom maintains fidelity comparable to MiniSom while achieving large speedups (77%–99% faster), with GPU execution providing the greatest gains, and improved topological preservation. The work additionally integrates clustering (K-means, GMM, HDBSCAN) and extensive visualization (U-matrix, component planes, reliability scores, clustering diagnostics) to support interpretable, production-ready unsupervised analysis. Overall, torchsom enables scalable, reproducible SOM workflows within PyTorch environments, bridging research and industrial use cases.

Abstract

This paper introduces torchsom, an open-source Python library that provides a reference implementation of the Self-Organizing Map (SOM) in PyTorch. This package offers three main features: (i) dimensionality reduction, (ii) clustering, and (iii) friendly data visualization. It relies on a PyTorch backend, enabling (i) fast and efficient training of SOMs through GPU acceleration, and (ii) easy and scalable integrations with PyTorch ecosystem. Moreover, torchsom follows the scikit-learn API for ease of use and extensibility. The library is released under the Apache 2.0 license with 90% test coverage, and its source code and documentation are available at https://github.com/michelin/TorchSOM.

Paper Structure

This paper contains 22 sections, 29 equations, 14 figures, 5 tables.

Figures (14)

  • Figure 1: Common architecture with an $m \times n$ grid. Each input feature is connected to every neuron $w_{ij} \in \mathbb{R}^k$ in the grid. For clarity, only the first feature $x_1$ and last feature $x_k$ are shown, with connections to the corner neurons - $w_{11}$, $w_{1n}$, $w_{m1}$, $w_{mn}$ - and the central neuron $w_{rr}$. Dots represent additional neurons in each row and column.
  • Figure 2: Rectangular topology
  • Figure 3: Hexagonal topology
  • Figure 5: Learning curves showing convergence of (top) and (bottom) during unsupervised training, using the wine data set and hexagonal topology.
  • Figure 6: Distance maps (U-matrices) depicting inter-neuron distances and cluster boundaries under (a) rectangular and (b) hexagonal topologies, using the wine data set.
  • ...and 9 more figures