Table of Contents
Fetching ...

Contextually Affinitive Neighborhood Refinery for Deep Clustering

Chunlin Yu, Ye Shi, Jingya Wang

TL;DR

This work tackles deep clustering in self-supervised learning by strengthening semantic grouping through neighborhood structure. It introduces CoNR, a plug-in framework that builds a Contextually Affinitive Neighborhood (ConAffN) via online re-ranking, enforcing cross-view neighborhood consistency with two-stage instance- and group-aware losses and a progressively relaxed boundary filtering strategy. A contextually refined feature space $H_f$, online reciprocal adjacency, and context-driven neighbor retrieval yield a robust loss $L_{sim}^{GA}$ and a total objective $L_{total}$ that smoothly transitions from instance- to group-level optimization. Empirically, CoNR achieves competitive or state-of-the-art results across five benchmarks with minimal overhead and strong robustness, while acknowledging limitations on unbalanced data and pointing to future work on long-tailed settings.

Abstract

Previous endeavors in self-supervised learning have enlightened the research of deep clustering from an instance discrimination perspective. Built upon this foundation, recent studies further highlight the importance of grouping semantically similar instances. One effective method to achieve this is by promoting the semantic structure preserved by neighborhood consistency. However, the samples in the local neighborhood may be limited due to their close proximity to each other, which may not provide substantial and diverse supervision signals. Inspired by the versatile re-ranking methods in the context of image retrieval, we propose to employ an efficient online re-ranking process to mine more informative neighbors in a Contextually Affinitive (ConAff) Neighborhood, and then encourage the cross-view neighborhood consistency. To further mitigate the intrinsic neighborhood noises near cluster boundaries, we propose a progressively relaxed boundary filtering strategy to circumvent the issues brought by noisy neighbors. Our method can be easily integrated into the generic self-supervised frameworks and outperforms the state-of-the-art methods on several popular benchmarks.

Contextually Affinitive Neighborhood Refinery for Deep Clustering

TL;DR

This work tackles deep clustering in self-supervised learning by strengthening semantic grouping through neighborhood structure. It introduces CoNR, a plug-in framework that builds a Contextually Affinitive Neighborhood (ConAffN) via online re-ranking, enforcing cross-view neighborhood consistency with two-stage instance- and group-aware losses and a progressively relaxed boundary filtering strategy. A contextually refined feature space , online reciprocal adjacency, and context-driven neighbor retrieval yield a robust loss and a total objective that smoothly transitions from instance- to group-level optimization. Empirically, CoNR achieves competitive or state-of-the-art results across five benchmarks with minimal overhead and strong robustness, while acknowledging limitations on unbalanced data and pointing to future work on long-tailed settings.

Abstract

Previous endeavors in self-supervised learning have enlightened the research of deep clustering from an instance discrimination perspective. Built upon this foundation, recent studies further highlight the importance of grouping semantically similar instances. One effective method to achieve this is by promoting the semantic structure preserved by neighborhood consistency. However, the samples in the local neighborhood may be limited due to their close proximity to each other, which may not provide substantial and diverse supervision signals. Inspired by the versatile re-ranking methods in the context of image retrieval, we propose to employ an efficient online re-ranking process to mine more informative neighbors in a Contextually Affinitive (ConAff) Neighborhood, and then encourage the cross-view neighborhood consistency. To further mitigate the intrinsic neighborhood noises near cluster boundaries, we propose a progressively relaxed boundary filtering strategy to circumvent the issues brought by noisy neighbors. Our method can be easily integrated into the generic self-supervised frameworks and outperforms the state-of-the-art methods on several popular benchmarks.
Paper Structure (29 sections, 17 equations, 4 figures, 6 tables, 1 algorithm)

This paper contains 29 sections, 17 equations, 4 figures, 6 tables, 1 algorithm.

Figures (4)

  • Figure 1: (a) Comparison between the conventional Euclidean Neighborhood and the ConAff Neighborhood, both with cross-view consistency. The Euclidean distance retrieves the Euclidean neighborhood, while the contextual distance is used for the ConAff neighborhood. In reciprocal relations, "1" means two samples are in each other's top-$k$ neighbors, and "0" indicates neither is in the other's top-$k$. The goal is to use reciprocal relations as a contextual distance metric for the ConAff neighborhood. For instance, distant pairs A and B might be contextually similar due to their similar reciprocal relations. (b) Images in the first column and their 10 nearest neighbors in the other columns, were retrieved using Euclidean Neighborhood and the proposed ConAff Neighborhood. Wrong neighbors are marked in red, and hard positives are marked in green. By default, unmarked neighbors are regarded as true neighbors.
  • Figure 2: (a) Performance comparison with different initial fraction ratios on CIFAR-10. (b) Performance with a different selection of $k_1$, $k_2$ on ImageNet-Dogs. (c) Clustering performance comparison with ConNR and BYOL on ImageNet-Dogs.
  • Figure 3: (a) T-SNE visualizations of all samples in CIFAR-10, where boundary samples are shown as small dots, non-boundary samples are shown as large dots. (b) T-SNE visualizations of boundary samples in CIFAR-10, where boundary samples are shown as small dots.
  • Figure 4: More visualizations of top-10 neighborhood on ImageNet-10.