Table of Contents
Fetching ...

Rethinking Self-Supervised Learning Within the Framework of Partial Information Decomposition

Salman Mohamadi, Gianfranco Doretto, Donald A. Adjeroh

TL;DR

The paper addresses whether mutual information should be increased or decreased in SSL by recasting SSL within the Partial Information Decomposition (PID) framework and using joint mutual information decomposed into unique, redundant, and synergistic components. It introduces a general progressive supervision pipeline that combines initial SSL training, clustering-based pseudo-labeling, and iterative, two-tier supervision to capture all PID components, especially the unique information from individual views. Integrated with four strong baselines, the approach yields consistent improvements across CIFAR-10/100, Tiny ImageNet, and ImageNet transfer settings, highlighting enhanced task-relevant and cluster-level representations. This PID-guided framework advances SSL by enabling joint local and global clustering and offers a versatile path toward higher-level supervision in self-supervised learning, with potential extensions to segmentation and detection.

Abstract

Self Supervised learning (SSL) has demonstrated its effectiveness in feature learning from unlabeled data. Regarding this success, there have been some arguments on the role that mutual information plays within the SSL framework. Some works argued for increasing mutual information between representation of augmented views. Others suggest decreasing mutual information between them, while increasing task-relevant information. We ponder upon this debate and propose to revisit the core idea of SSL within the framework of partial information decomposition (PID). Thus, with SSL under PID we propose to replace traditional mutual information with the more general concept of joint mutual information to resolve the argument. Our investigation on instantiation of SSL within the PID framework leads to upgrading the existing pipelines by considering the components of the PID in the SSL models for improved representation learning. Accordingly we propose a general pipeline that can be applied to improve existing baselines. Our pipeline focuses on extracting the unique information component under the PID to build upon lower level supervision for generic feature learning and on developing higher-level supervisory signals for task-related feature learning. In essence, this could be interpreted as a joint utilization of local and global clustering. Experiments on four baselines and four datasets show the effectiveness and generality of our approach in improving existing SSL frameworks.

Rethinking Self-Supervised Learning Within the Framework of Partial Information Decomposition

TL;DR

The paper addresses whether mutual information should be increased or decreased in SSL by recasting SSL within the Partial Information Decomposition (PID) framework and using joint mutual information decomposed into unique, redundant, and synergistic components. It introduces a general progressive supervision pipeline that combines initial SSL training, clustering-based pseudo-labeling, and iterative, two-tier supervision to capture all PID components, especially the unique information from individual views. Integrated with four strong baselines, the approach yields consistent improvements across CIFAR-10/100, Tiny ImageNet, and ImageNet transfer settings, highlighting enhanced task-relevant and cluster-level representations. This PID-guided framework advances SSL by enabling joint local and global clustering and offers a versatile path toward higher-level supervision in self-supervised learning, with potential extensions to segmentation and detection.

Abstract

Self Supervised learning (SSL) has demonstrated its effectiveness in feature learning from unlabeled data. Regarding this success, there have been some arguments on the role that mutual information plays within the SSL framework. Some works argued for increasing mutual information between representation of augmented views. Others suggest decreasing mutual information between them, while increasing task-relevant information. We ponder upon this debate and propose to revisit the core idea of SSL within the framework of partial information decomposition (PID). Thus, with SSL under PID we propose to replace traditional mutual information with the more general concept of joint mutual information to resolve the argument. Our investigation on instantiation of SSL within the PID framework leads to upgrading the existing pipelines by considering the components of the PID in the SSL models for improved representation learning. Accordingly we propose a general pipeline that can be applied to improve existing baselines. Our pipeline focuses on extracting the unique information component under the PID to build upon lower level supervision for generic feature learning and on developing higher-level supervisory signals for task-related feature learning. In essence, this could be interpreted as a joint utilization of local and global clustering. Experiments on four baselines and four datasets show the effectiveness and generality of our approach in improving existing SSL frameworks.

Paper Structure

This paper contains 17 sections, 5 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: A given SSL framework undergoes an initial training, and clustering-based labeling, before the main phase of training, iterative refinement. The three-variable information system under PID framework is instantiated to have all three components of PID, namely unique, redundant and synergistic components. The detailed derivation supporting this design is depicted in Supplementary. Our framework differentiates itself from "paradigms including clustering in SSL" by joint invariance enforcement to augmented representations, toward learning associations of views at both sample level and cluster level. The two components of PID, synergistic and redundant information are instantiated via iterative sample-level representation learning where the two views are jointly involved, whereas the third component of PID, unique information, is instantiated via iterative supervised training using pseudo-label for each individual view.In essence, this framework allows for joint local and global clustering.
  • Figure 2: PID in case of three variables, PID presents the structure of multivariate information consisting of two source variables $S_1$ and $S_2$ as well as a target variable $T$ .