Table of Contents
Fetching ...

K-Means as a Radial Basis function Network: a Variational and Gradient-based Equivalence

Felipe de Jesus Felix Arredondo, Alejandro Ucan-Puc, Carlos Astengo Noguez

TL;DR

It is proved that the RBF objective $\Gamma$-converges to the K-Means solution as the temperature parameter $\sigma$ vanishes, enabling K-Means to be embedded directly into deep learning architectures for the joint optimization of representations and clusters.

Abstract

This work establishes a rigorous variational and gradient-based equivalence between the classical K-Means algorithm and differentiable Radial Basis Function (RBF) neural networks with smooth responsibilities. By reparameterizing the K-Means objective and embedding its distortion functional into a smooth weighted loss, we prove that the RBF objective $Γ$-converges to the K-Means solution as the temperature parameter $σ$ vanishes. We further demonstrate that the gradient-based updates of the RBF centers recover the exact K-Means centroid update rule and induce identical training trajectories in the limit. To address the numerical instability of the Softmax transformation in the low-temperature regime, we propose the integration of Entmax-1.5, which ensures stable polynomial convergence while preserving the underlying Voronoi partition structure. These results bridge the conceptual gap between discrete partitioning and continuous optimization, enabling K-Means to be embedded directly into deep learning architectures for the joint optimization of representations and clusters. Empirical validation across diverse synthetic geometries confirms a monotone collapse of soft RBF centroids toward K-Means fixed points, providing a unified framework for end-to-end differentiable clustering.

K-Means as a Radial Basis function Network: a Variational and Gradient-based Equivalence

TL;DR

It is proved that the RBF objective -converges to the K-Means solution as the temperature parameter vanishes, enabling K-Means to be embedded directly into deep learning architectures for the joint optimization of representations and clusters.

Abstract

This work establishes a rigorous variational and gradient-based equivalence between the classical K-Means algorithm and differentiable Radial Basis Function (RBF) neural networks with smooth responsibilities. By reparameterizing the K-Means objective and embedding its distortion functional into a smooth weighted loss, we prove that the RBF objective -converges to the K-Means solution as the temperature parameter vanishes. We further demonstrate that the gradient-based updates of the RBF centers recover the exact K-Means centroid update rule and induce identical training trajectories in the limit. To address the numerical instability of the Softmax transformation in the low-temperature regime, we propose the integration of Entmax-1.5, which ensures stable polynomial convergence while preserving the underlying Voronoi partition structure. These results bridge the conceptual gap between discrete partitioning and continuous optimization, enabling K-Means to be embedded directly into deep learning architectures for the joint optimization of representations and clusters. Empirical validation across diverse synthetic geometries confirms a monotone collapse of soft RBF centroids toward K-Means fixed points, providing a unified framework for end-to-end differentiable clustering.
Paper Structure (36 sections, 8 theorems, 81 equations, 2 figures, 1 table, 4 algorithms)

This paper contains 36 sections, 8 theorems, 81 equations, 2 figures, 1 table, 4 algorithms.

Key Result

Theorem 1

Let $X=\{x_i\}_{i=1}^n\subset\mathbb R^d$ be fixed and let $\mathcal{K}\subset\mathbb R^{k\times d}$ be compact. For $\sigma>0$ define the soft clustering functional and the hard clustering functional Then $\mathcal{L}_\sigma \xrightarrow{\Gamma} J$ on $\mathcal{K}$ as $\sigma\to0$. In particular, if $\sigma_m\to0$ and $\mu^*(\sigma_m)\in\arg\min_{\mu\in\mathcal{K}}\mathcal{L}_{\sigma_m}(\mu)$,

Figures (2)

  • Figure 1: Centroid discrepancies $d_j(\sigma)$ under two temperature--decay protocols. (Sigma-wise) A fixed centroid initialization is evaluated across a decreasing $\sigma$ schedule, measuring the deviation from the hard K-Means solution at each temperature. (Initialization-wise) Centroids are re-initialized independently over $M$ runs, and the discrepancy is averaged across initializations for each $\sigma$. In both settings, all datasets exhibit a monotone collapse of soft-RBF centroids toward the K-Means centroids as $\sigma \to 0$.
  • Figure 2: Convergence paths of the smoothed centroids as $\sigma$ decreases. Each curve shows the trajectory from initialization to its final position. As $\sigma \to 0$, all paths collapse onto the starred point, which corresponds to the true K-Means solution, demonstrating that the smoothed formulation converges to the hard K-Means optimum.

Theorems & Definitions (15)

  • Theorem 1: $\Gamma$-limit and convergence of minimizers
  • proof
  • Corollary 1
  • Lemma 1: Soft centroid as fixed point
  • proof
  • Theorem 2: K-Means update as a gradient step
  • proof
  • Theorem 3: Exponential Convergence of RBF Centroids
  • proof
  • Theorem 4: Centroid Deviation for Entmax--1.5
  • ...and 5 more