Silhouette-Driven Instance-Weighted $k$-means

Aggelos Semoglou; Aristidis Likas; John Pavlopoulos

Silhouette-Driven Instance-Weighted $k$-means

Aggelos Semoglou, Aristidis Likas, John Pavlopoulos

TL;DR

K-Sil is introduced, a silhouette-driven $k-means variant that, at each iteration, weights points using a centroid-margin proxy for the silhouette score, emphasizing confidently assigned instances while down-weighting borderline or noisy regions.

Abstract

Clustering is a fundamental unsupervised learning task with applications across a wide range of domains. Popular algorithms such as $k$-means are efficient and widely used, but can be sensitive to outliers, ambiguous boundary points, and heterogeneous cluster geometry, which may distort centroid estimates and yield suboptimal partitions. We introduce K-Sil, a silhouette-driven $k$-means variant that, at each iteration, weights points using a centroid-margin proxy for the silhouette score, emphasizing confidently assigned instances while down-weighting borderline or noisy regions. Centroid updates take the form of a softmax-weighted mean, and an adaptive temperature automatically calibrates the sharpness of the weight distribution using a cluster-balanced, macro-averaged, silhouette criterion. Under standard separation conditions, we establish a local convergence result for the induced weighted centroid updates. Experiments on 15 real-world datasets spanning tabular, biomedical, text, and image representations show consistent gains in internal validation metrics and typical improvements in external validation metrics over $k$-means and competitive instance-weighted baselines.

Silhouette-Driven Instance-Weighted $k$-means

TL;DR

Abstract

Clustering is a fundamental unsupervised learning task with applications across a wide range of domains. Popular algorithms such as

-means are efficient and widely used, but can be sensitive to outliers, ambiguous boundary points, and heterogeneous cluster geometry, which may distort centroid estimates and yield suboptimal partitions. We introduce K-Sil, a silhouette-driven

-means variant that, at each iteration, weights points using a centroid-margin proxy for the silhouette score, emphasizing confidently assigned instances while down-weighting borderline or noisy regions. Centroid updates take the form of a softmax-weighted mean, and an adaptive temperature automatically calibrates the sharpness of the weight distribution using a cluster-balanced, macro-averaged, silhouette criterion. Under standard separation conditions, we establish a local convergence result for the induced weighted centroid updates. Experiments on 15 real-world datasets spanning tabular, biomedical, text, and image representations show consistent gains in internal validation metrics and typical improvements in external validation metrics over

-means and competitive instance-weighted baselines.

Silhouette-Driven Instance-Weighted $k$-means

TL;DR

Abstract

Silhouette-Driven Instance-Weighted $k$-means

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (14)

Theorems & Definitions (5)