K-Means as a Radial Basis function Network: a Variational and Gradient-based Equivalence

Felipe de Jesus Felix Arredondo; Alejandro Ucan-Puc; Carlos Astengo Noguez

K-Means as a Radial Basis function Network: a Variational and Gradient-based Equivalence

Felipe de Jesus Felix Arredondo, Alejandro Ucan-Puc, Carlos Astengo Noguez

TL;DR

It is proved that the RBF objective $\Gamma$-converges to the K-Means solution as the temperature parameter $\sigma$ vanishes, enabling K-Means to be embedded directly into deep learning architectures for the joint optimization of representations and clusters.

Abstract

This work establishes a rigorous variational and gradient-based equivalence between the classical K-Means algorithm and differentiable Radial Basis Function (RBF) neural networks with smooth responsibilities. By reparameterizing the K-Means objective and embedding its distortion functional into a smooth weighted loss, we prove that the RBF objective $Γ$-converges to the K-Means solution as the temperature parameter $σ$ vanishes. We further demonstrate that the gradient-based updates of the RBF centers recover the exact K-Means centroid update rule and induce identical training trajectories in the limit. To address the numerical instability of the Softmax transformation in the low-temperature regime, we propose the integration of Entmax-1.5, which ensures stable polynomial convergence while preserving the underlying Voronoi partition structure. These results bridge the conceptual gap between discrete partitioning and continuous optimization, enabling K-Means to be embedded directly into deep learning architectures for the joint optimization of representations and clusters. Empirical validation across diverse synthetic geometries confirms a monotone collapse of soft RBF centroids toward K-Means fixed points, providing a unified framework for end-to-end differentiable clustering.

K-Means as a Radial Basis function Network: a Variational and Gradient-based Equivalence

TL;DR

It is proved that the RBF objective

-converges to the K-Means solution as the temperature parameter

vanishes, enabling K-Means to be embedded directly into deep learning architectures for the joint optimization of representations and clusters.

Abstract

-converges to the K-Means solution as the temperature parameter

vanishes. We further demonstrate that the gradient-based updates of the RBF centers recover the exact K-Means centroid update rule and induce identical training trajectories in the limit. To address the numerical instability of the Softmax transformation in the low-temperature regime, we propose the integration of Entmax-1.5, which ensures stable polynomial convergence while preserving the underlying Voronoi partition structure. These results bridge the conceptual gap between discrete partitioning and continuous optimization, enabling K-Means to be embedded directly into deep learning architectures for the joint optimization of representations and clusters. Empirical validation across diverse synthetic geometries confirms a monotone collapse of soft RBF centroids toward K-Means fixed points, providing a unified framework for end-to-end differentiable clustering.

Paper Structure (36 sections, 8 theorems, 81 equations, 2 figures, 1 table, 4 algorithms)

This paper contains 36 sections, 8 theorems, 81 equations, 2 figures, 1 table, 4 algorithms.

Introduction
Main Contribution.
Organization of the Paper.
Preliminaries and Theoretical Background
K-Means
Radial Basis Function Networks and Their Responsibilities
Formal Comparison Between the k-Means Update and the RBF Update
Variational Reparametrization via Responsibilities
Entropic Relaxation and Zero-Temperature Limit
Equivalence Between RBF Optimization and K-Means
Updating Process Equivalence
Remark (Step size and contractive dynamics).
Mean Error Between RBF---Soft Clustering and K-Means
Computational Problems Along the Decision Process: Softmax Instability and the Entmax--1.5 Solution
Experimental Testing
...and 21 more sections

Key Result

Theorem 1

Let $X=\{x_i\}_{i=1}^n\subset\mathbb R^d$ be fixed and let $\mathcal{K}\subset\mathbb R^{k\times d}$ be compact. For $\sigma>0$ define the soft clustering functional and the hard clustering functional Then $\mathcal{L}_\sigma \xrightarrow{\Gamma} J$ on $\mathcal{K}$ as $\sigma\to0$. In particular, if $\sigma_m\to0$ and $\mu^*(\sigma_m)\in\arg\min_{\mu\in\mathcal{K}}\mathcal{L}_{\sigma_m}(\mu)$,

Figures (2)

Figure 1: Centroid discrepancies $d_j(\sigma)$ under two temperature--decay protocols. (Sigma-wise) A fixed centroid initialization is evaluated across a decreasing $\sigma$ schedule, measuring the deviation from the hard K-Means solution at each temperature. (Initialization-wise) Centroids are re-initialized independently over $M$ runs, and the discrepancy is averaged across initializations for each $\sigma$. In both settings, all datasets exhibit a monotone collapse of soft-RBF centroids toward the K-Means centroids as $\sigma \to 0$.
Figure 2: Convergence paths of the smoothed centroids as $\sigma$ decreases. Each curve shows the trajectory from initialization to its final position. As $\sigma \to 0$, all paths collapse onto the starred point, which corresponds to the true K-Means solution, demonstrating that the smoothed formulation converges to the hard K-Means optimum.

Theorems & Definitions (15)

Theorem 1: $\Gamma$-limit and convergence of minimizers
proof
Corollary 1
Lemma 1: Soft centroid as fixed point
proof
Theorem 2: K-Means update as a gradient step
proof
Theorem 3: Exponential Convergence of RBF Centroids
proof
Theorem 4: Centroid Deviation for Entmax--1.5
...and 5 more

K-Means as a Radial Basis function Network: a Variational and Gradient-based Equivalence

TL;DR

Abstract

K-Means as a Radial Basis function Network: a Variational and Gradient-based Equivalence

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (15)