Table of Contents
Fetching ...

Contrastive Factor Analysis

Zhibin Duan, Tiansheng Wen, Yifei Wang, Chen Zhu, Bo Chen, Mingyuan Zhou

TL;DR

This work identifies a gap between traditional factor analysis and modern contrastive learning and proposes Contrastive Factor Analysis (CFA) to merge their strengths by factorizing a normalized co-occurrence matrix with latent Gaussian factors. It generalizes to a non-negative version (CNFA) using Gamma latents for disentangled representations, and leverages variational inference with Gaussian and Weibull posteriors to learn posterior distributions efficiently. A CL-inspired reformulation yields a tractable objective suitable for gradient-based optimization, while explicit uncertainty measures are derived from posterior entropies. Across in-distribution and out-of-distribution tasks, CFA/CNFA demonstrate improved expressiveness, robustness, interpretability, and calibrated uncertainty estimation, highlighting their potential for robust unsupervised representation learning in the deep-learning era.

Abstract

Factor analysis, often regarded as a Bayesian variant of matrix factorization, offers superior capabilities in capturing uncertainty, modeling complex dependencies, and ensuring robustness. As the deep learning era arrives, factor analysis is receiving less and less attention due to their limited expressive ability. On the contrary, contrastive learning has emerged as a potent technique with demonstrated efficacy in unsupervised representational learning. While the two methods are different paradigms, recent theoretical analysis has revealed the mathematical equivalence between contrastive learning and matrix factorization, providing a potential possibility for factor analysis combined with contrastive learning. Motivated by the interconnectedness of contrastive learning, matrix factorization, and factor analysis, this paper introduces a novel Contrastive Factor Analysis framework, aiming to leverage factor analysis's advantageous properties within the realm of contrastive learning. To further leverage the interpretability properties of non-negative factor analysis, which can learn disentangled representations, contrastive factor analysis is extended to a non-negative version. Finally, extensive experimental validation showcases the efficacy of the proposed contrastive (non-negative) factor analysis methodology across multiple key properties, including expressiveness, robustness, interpretability, and accurate uncertainty estimation.

Contrastive Factor Analysis

TL;DR

This work identifies a gap between traditional factor analysis and modern contrastive learning and proposes Contrastive Factor Analysis (CFA) to merge their strengths by factorizing a normalized co-occurrence matrix with latent Gaussian factors. It generalizes to a non-negative version (CNFA) using Gamma latents for disentangled representations, and leverages variational inference with Gaussian and Weibull posteriors to learn posterior distributions efficiently. A CL-inspired reformulation yields a tractable objective suitable for gradient-based optimization, while explicit uncertainty measures are derived from posterior entropies. Across in-distribution and out-of-distribution tasks, CFA/CNFA demonstrate improved expressiveness, robustness, interpretability, and calibrated uncertainty estimation, highlighting their potential for robust unsupervised representation learning in the deep-learning era.

Abstract

Factor analysis, often regarded as a Bayesian variant of matrix factorization, offers superior capabilities in capturing uncertainty, modeling complex dependencies, and ensuring robustness. As the deep learning era arrives, factor analysis is receiving less and less attention due to their limited expressive ability. On the contrary, contrastive learning has emerged as a potent technique with demonstrated efficacy in unsupervised representational learning. While the two methods are different paradigms, recent theoretical analysis has revealed the mathematical equivalence between contrastive learning and matrix factorization, providing a potential possibility for factor analysis combined with contrastive learning. Motivated by the interconnectedness of contrastive learning, matrix factorization, and factor analysis, this paper introduces a novel Contrastive Factor Analysis framework, aiming to leverage factor analysis's advantageous properties within the realm of contrastive learning. To further leverage the interpretability properties of non-negative factor analysis, which can learn disentangled representations, contrastive factor analysis is extended to a non-negative version. Finally, extensive experimental validation showcases the efficacy of the proposed contrastive (non-negative) factor analysis methodology across multiple key properties, including expressiveness, robustness, interpretability, and accurate uncertainty estimation.
Paper Structure (23 sections, 12 equations, 4 figures, 4 tables)

This paper contains 23 sections, 12 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Relationship between different learning paradigms discussed in this work.
  • Figure 2: The graphical model of \ref{['fig_gbn']}: contrastive learning (CL) \ref{['fig_gm']}: the generative model of contrative factor analysis; \ref{['fig_vi']}: variational inference (VI) of contrative factor analysis. Circles are stochastic variables, and squares are deterministic variables.
  • Figure 3: We present 20 images on ImageNet-100 with the highest and lowest uncertainty predicted by CNFA. Figure. \ref{['uncertainty_vis']} visually indicates that images with noisy backgrounds exhibit higher levels of uncertainty. In contrast, Figure. \ref{['certainty_vis']} shows that a clean background with high contrast is prone to displaying low uncertainty. This clear distinction highlights the effectiveness of CNFA in identifying the factors contributing to uncertainty in image predictions.
  • Figure 4: Utilizing the entropy of samples within the test set, we organized them into five subsets based on their respective entropy levels. Subsequently, we plot histograms, illustrating the average entropy and accuracy for each of these subsets. The histograms clearly depict a negative correlation between entropy and accuracy: as entropy increases (uncertainty becomes greater), accuracy decreases.