Contextually Affinitive Neighborhood Refinery for Deep Clustering
Chunlin Yu, Ye Shi, Jingya Wang
TL;DR
This work tackles deep clustering in self-supervised learning by strengthening semantic grouping through neighborhood structure. It introduces CoNR, a plug-in framework that builds a Contextually Affinitive Neighborhood (ConAffN) via online re-ranking, enforcing cross-view neighborhood consistency with two-stage instance- and group-aware losses and a progressively relaxed boundary filtering strategy. A contextually refined feature space $H_f$, online reciprocal adjacency, and context-driven neighbor retrieval yield a robust loss $L_{sim}^{GA}$ and a total objective $L_{total}$ that smoothly transitions from instance- to group-level optimization. Empirically, CoNR achieves competitive or state-of-the-art results across five benchmarks with minimal overhead and strong robustness, while acknowledging limitations on unbalanced data and pointing to future work on long-tailed settings.
Abstract
Previous endeavors in self-supervised learning have enlightened the research of deep clustering from an instance discrimination perspective. Built upon this foundation, recent studies further highlight the importance of grouping semantically similar instances. One effective method to achieve this is by promoting the semantic structure preserved by neighborhood consistency. However, the samples in the local neighborhood may be limited due to their close proximity to each other, which may not provide substantial and diverse supervision signals. Inspired by the versatile re-ranking methods in the context of image retrieval, we propose to employ an efficient online re-ranking process to mine more informative neighbors in a Contextually Affinitive (ConAff) Neighborhood, and then encourage the cross-view neighborhood consistency. To further mitigate the intrinsic neighborhood noises near cluster boundaries, we propose a progressively relaxed boundary filtering strategy to circumvent the issues brought by noisy neighbors. Our method can be easily integrated into the generic self-supervised frameworks and outperforms the state-of-the-art methods on several popular benchmarks.
