Table of Contents
Fetching ...

Dependence Induced Representations

Xiangxiang Xu, Lizhong Zheng

TL;DR

A large family of loss functions that can learn dependence induced representations, including cross entropy, hinge loss, and their regularized variants are characterized, which shows that the features learned can be expressed as the composition of a loss-dependent function and the maximal correlation function, which reveals a key connection between representations learned from different losses.

Abstract

We study the problem of learning feature representations from a pair of random variables, where we focus on the representations that are induced by their dependence. We provide sufficient and necessary conditions for such dependence induced representations, and illustrate their connections to Hirschfeld--Gebelein--Rényi (HGR) maximal correlation functions and minimal sufficient statistics. We characterize a large family of loss functions that can learn dependence induced representations, including cross entropy, hinge loss, and their regularized variants. In particular, we show that the features learned from this family can be expressed as the composition of a loss-dependent function and the maximal correlation function, which reveals a key connection between representations learned from different losses. Our development also gives a statistical interpretation of the neural collapse phenomenon observed in deep classifiers. Finally, we present the learning design based on the feature separation, which allows hyperparameter tuning during inference.

Dependence Induced Representations

TL;DR

A large family of loss functions that can learn dependence induced representations, including cross entropy, hinge loss, and their regularized variants are characterized, which shows that the features learned can be expressed as the composition of a loss-dependent function and the maximal correlation function, which reveals a key connection between representations learned from different losses.

Abstract

We study the problem of learning feature representations from a pair of random variables, where we focus on the representations that are induced by their dependence. We provide sufficient and necessary conditions for such dependence induced representations, and illustrate their connections to Hirschfeld--Gebelein--Rényi (HGR) maximal correlation functions and minimal sufficient statistics. We characterize a large family of loss functions that can learn dependence induced representations, including cross entropy, hinge loss, and their regularized variants. In particular, we show that the features learned from this family can be expressed as the composition of a loss-dependent function and the maximal correlation function, which reveals a key connection between representations learned from different losses. Our development also gives a statistical interpretation of the neural collapse phenomenon observed in deep classifiers. Finally, we present the learning design based on the feature separation, which allows hyperparameter tuning during inference.

Paper Structure

This paper contains 24 sections, 10 theorems, 29 equations, 2 figures.

Key Result

Proposition 1

Given $(X, Y)$, let $\hat{X} \in \hat{\mathcal{X}}, \hat{Y} \in \hat{\mathcal{Y}}$ be the variables obtained from dependence preserving transformations $\xi, \eta$ [cf. Definition def:dep]. Then, for all $(\hat{x}, {\hat{y}}) \in \hat{\mathcal{X}} \times \hat{\mathcal{Y}}$, we have $\mathfrak{i}_{\h

Figures (2)

  • Figure 2: Feature learning by training adapters $\phi, \psi$. The feature extractors $f^*$, $g^*$ can be frozen pre-trained networks or non-trainable modules.
  • Figure 3: The feature adapters $\phi^{(\lambda)}, \psi^{(\lambda)}$ are parameterized by the hyperparameter $\lambda$, which is tunable during inference.

Theorems & Definitions (26)

  • Definition 1
  • Remark 1
  • Remark 2
  • Definition 2
  • Remark 3
  • Proposition 1
  • Definition 3
  • Theorem 1
  • proof
  • Remark 4
  • ...and 16 more