Table of Contents
Fetching ...

Computing $k$-means in mixed precision

Erin Carson, Xinye Chen, Xiaobo Liu

TL;DR

The paper addresses accelerating k-means with mixed-precision arithmetic while preserving numerical stability. It establishes the stability of the standard distance formulation, proposes a two-level mixed-precision framework (initialization and distance computations in low precision, center updates in working precision), and provides theoretical bounds for mixed-precision distance errors. Through extensive experiments on synthetic data, real-world datasets, and image segmentation, it shows that normalization improves tolerance to reduced precision and that substantial portions of distance computations can be performed in low precision without significantly harming clustering quality. The findings indicate meaningful speedups and energy savings with mp-kmeans, especially when using FP16, while highlighting limitations related to overflow in nonnormalized data and underflow in very low-precision settings. Overall, this work opens pathways for applying mixed-precision strategies to distance-based learning algorithms beyond k-means.

Abstract

The $k$-means algorithm is one of the most popular and critical techniques in data mining and machine learning, and it has achieved significant success in numerous science and engineering domains. Computing $k$-means to a global optimum is NP-hard in Euclidean space, yet there are a variety of efficient heuristic algorithms, such as Lloyd's algorithm, that converge to a local optimum with superpolynomial complexity in the worst case. Motivated by the emergence and prominence of mixed precision capabilities in hardware, a current trend is to develop low and mixed precision variants of algorithms in order to improve the runtime and energy consumption. In this paper we study the numerical stability of Lloyd's $k$-means algorithm, and, in particular, we confirm the stability of the widely used distance computation formula. We propose a mixed-precision framework for $k$-means computation and investigate the effects of low-precision distance computation within the framework. Through extensive simulations on various data clustering and image segmentation tasks, we verify the applicability and robustness of the mixed precision $k$-means method. We find that, in $k$-means computation, normalized data is more tolerant to the reduction of precision in the distance computation, while for nonnormalized data more care is needed in the use of reduced precision, mainly to avoid overflow. Our study demonstrates the potential for the use of mixed precision to accelerate the $k$-means computation and offers some insights into other distance-based machine learning methods.

Computing $k$-means in mixed precision

TL;DR

The paper addresses accelerating k-means with mixed-precision arithmetic while preserving numerical stability. It establishes the stability of the standard distance formulation, proposes a two-level mixed-precision framework (initialization and distance computations in low precision, center updates in working precision), and provides theoretical bounds for mixed-precision distance errors. Through extensive experiments on synthetic data, real-world datasets, and image segmentation, it shows that normalization improves tolerance to reduced precision and that substantial portions of distance computations can be performed in low precision without significantly harming clustering quality. The findings indicate meaningful speedups and energy savings with mp-kmeans, especially when using FP16, while highlighting limitations related to overflow in nonnormalized data and underflow in very low-precision settings. Overall, this work opens pathways for applying mixed-precision strategies to distance-based learning algorithms beyond k-means.

Abstract

The -means algorithm is one of the most popular and critical techniques in data mining and machine learning, and it has achieved significant success in numerous science and engineering domains. Computing -means to a global optimum is NP-hard in Euclidean space, yet there are a variety of efficient heuristic algorithms, such as Lloyd's algorithm, that converge to a local optimum with superpolynomial complexity in the worst case. Motivated by the emergence and prominence of mixed precision capabilities in hardware, a current trend is to develop low and mixed precision variants of algorithms in order to improve the runtime and energy consumption. In this paper we study the numerical stability of Lloyd's -means algorithm, and, in particular, we confirm the stability of the widely used distance computation formula. We propose a mixed-precision framework for -means computation and investigate the effects of low-precision distance computation within the framework. Through extensive simulations on various data clustering and image segmentation tasks, we verify the applicability and robustness of the mixed precision -means method. We find that, in -means computation, normalized data is more tolerant to the reduction of precision in the distance computation, while for nonnormalized data more care is needed in the use of reduced precision, mainly to avoid overflow. Our study demonstrates the potential for the use of mixed precision to accelerate the -means computation and offers some insights into other distance-based machine learning methods.
Paper Structure (16 sections, 6 theorems, 33 equations, 10 figures, 6 tables, 5 algorithms)

This paper contains 16 sections, 6 theorems, 33 equations, 10 figures, 6 tables, 5 algorithms.

Key Result

Lemma 4.1

Given an arbitrary data point $p'$ in the cluster $S$ whose mean center is denoted by $\mu$, we have

Figures (10)

  • Figure 5.1: The difference measured as \ref{['eq:fro-diff']} of the two distance computing formulae \ref{['eq:dist-eval-alternative']} and \ref{['eq:dist-eval']} in double precision.
  • Figure 6.1: The triggered rate $\eta$ of low precision computations in terms of $\delta$ on synthetic Gaussian blobs data of different deviation $\sigma$, where 2,000 Gaussian data points with 10 clusters (blobs) are generated.
  • Figure 6.2: The performance in terms of ARI and AMI of Algorithm \ref{['alg:k-means-mp']} with varying $\delta$ on on synthetic Gaussian blobs data of different deviations.
  • Figure 7.1: Visualization of the S-sets (colors marked as ground truth clusters).
  • Figure 7.2: Tasks in images selected from ImageNet.
  • ...and 5 more figures

Theorems & Definitions (11)

  • Lemma 4.1: arva07
  • Theorem 5.1
  • proof
  • Theorem 5.2
  • proof
  • Lemma 5.1
  • proof
  • Theorem 5.3
  • proof
  • Theorem 6.1
  • ...and 1 more