VertCoHiRF: Decentralized Vertical Clustering Beyond k-means
Bruno Belucci, Karim Lounici, Vladimir R. Kostic, Katia Meziani
TL;DR
VertCoHiRF introduces a fully decentralized vertical clustering framework that avoids sharing feature data and relies on structural consensus across heterogeneous local views. By exchanging only cluster codes and ordinal rankings, agents iteratively build a Cluster Fusion Hierarchy (CFH) through a veto-based two-phase protocol that selects representative medoids and shrinks the problem size. The method provides identifier-level privacy guarantees and robustness to Byzantine behavior, with theoretical bounds on communication complexity and empirical evidence showing competitive clustering performance across synthetic and real-world VFL scenarios. This structure-aware, privacy-preserving approach enables flexible, multi-view clustering beyond k-means and offers interpretable cross-view clustering through CFH. The work has practical impact for privacy-conscious, distributed data collaborations where feature spaces are fragmented and heterogeneous.
Abstract
Vertical Federated Learning (VFL) enables collaborative analysis across parties holding complementary feature views of the same samples, yet existing approaches are largely restricted to distributed variants of $k$-means, requiring centralized coordination or the exchange of feature-dependent numerical statistics, and exhibiting limited robustness under heterogeneous views or adversarial behavior. We introduce VertCoHiRF, a fully decentralized framework for vertical federated clustering based on structural consensus across heterogeneous views, allowing each agent to apply a base clustering method adapted to its local feature space in a peer-to-peer manner. Rather than exchanging feature-dependent statistics or relying on noise injection for privacy, agents cluster their local views independently and reconcile their proposals through identifier-level consensus. Consensus is achieved via decentralized ordinal ranking to select representative medoids, progressively inducing a shared hierarchical clustering across agents. Communication is limited to sample identifiers, cluster labels, and ordinal rankings, providing privacy by design while supporting overlapping feature partitions and heterogeneous local clustering methods, and yielding an interpretable shared Cluster Fusion Hierarchy (CFH) that captures cross-view agreement at multiple resolutions.We analyze communication complexity and robustness, and experiments demonstrate competitive clustering performance in vertical federated settings.
