A representation-learning game for classes of prediction tasks

Neria Uzan; Nir Weinberger

A representation-learning game for classes of prediction tasks

Neria Uzan, Nir Weinberger

TL;DR

A game-based formulation for learning dimensionality-reducing representations of feature vectors, when only a prior knowledge on future prediction tasks is available, and an efficient algorithm to optimize a randomized representation is proposed.

Abstract

We propose a game-based formulation for learning dimensionality-reducing representations of feature vectors, when only a prior knowledge on future prediction tasks is available. In this game, the first player chooses a representation, and then the second player adversarially chooses a prediction task from a given class, representing the prior knowledge. The first player aims is to minimize, and the second player to maximize, the regret: The minimal prediction loss using the representation, compared to the same loss using the original features. For the canonical setting in which the representation, the response to predict and the predictors are all linear functions, and under the mean squared error loss function, we derive the theoretically optimal representation in pure strategies, which shows the effectiveness of the prior knowledge, and the optimal regret in mixed strategies, which shows the usefulness of randomizing the representation. For general representations and loss functions, we propose an efficient algorithm to optimize a randomized representation. The algorithm only requires the gradients of the loss function, and is based on incrementally adding a representation rule to a mixture of such rules.

A representation-learning game for classes of prediction tasks

TL;DR

Abstract

Paper Structure (24 sections, 14 theorems, 133 equations, 7 figures, 4 tables)

This paper contains 24 sections, 14 theorems, 133 equations, 7 figures, 4 tables.

Introduction
Problem formulation
The linear setting under MSE loss
An algorithm for general classes and loss functions
Conclusion
Classes of response functions
Additional related work
Notation conventions
Useful mathematical results
The linear MSE setting: additions and proofs
The standard principal component setting
Proofs of pure and mixed minimax representations
The Hilbert space MSE setting
Proofs
Iterative algorithms for the Phase 1 and Phase 2 problems
...and 9 more sections

Key Result

Theorem 2

For the linear MSE setting (Definition def: linear MSE) A minimax representation matrix is and the worst case response function is

Figures (7)

Figure 1: Left: Pure and mixed minimax regret and $\ell_{*}$ for Example \ref{['exa: Identity of response weights']}, for $d=50,r=25$, with $\text{$\lambda_{i}=\sigma_{i}^{2}\propto i^{-\alpha}$}$. Right: Pure and mixed minimax regret and $\ell_{*}$ for Example \ref{['exa: diagonal case']}, for $d=50,r=25$, with $\text{$\sigma_{i}^{2}\propto i^{-\alpha}$ and $s_{i}\propto i^{2}$}$. The trend of $\ell_{*}$ is reversed for $\alpha>2$.
Figure 2: Results of Algorithm \ref{['alg: Iterative algorithm']}. Left: $r=5$, varying $d$. The ratio between the regret achieved by Algorithm \ref{['alg: Iterative algorithm']} and the theoretical regret in the linear MSE setting. Right: $r=3$, varying $d$. The regret achieved by Algorithm \ref{['alg: Iterative algorithm']} in the linear cross-entropy setting, various $m$.
Figure 3: Results on the dataset of images. Comparison between optimized minimax representation (simplified version of Algorithm \ref{['alg: Iterative algorithm']}) vs. PCA. Worst-case function in blue, and average-case function in orange. Left: Cross entropy loss. Right: Accuracy.
Figure 4: The learning curve for Algorithm \ref{['alg: Iterative algorithm']} in the linear MSE setting: $d=20$, $r=3$, $\sigma=1$.
Figure 5: The ratio between the regret achieved by Algorithm \ref{['alg: Iterative algorithm']} and the theoretical regret in the linear MSE setting. Left: $d=20$, $\sigma_{0}=1$, varying $r$. Right: $r=5$, $d=20$, varying $\sigma_{0}$.
...and 2 more figures

Theorems & Definitions (32)

Definition 1: The linear MSE setting
Theorem 2
Theorem 3
Example 4
Example 5
Example 6
Definition 7: The linear cross-entropy setting
Example 8
Example 9: Comparison with PCA for multi-label Classification
Theorem 10: Eckart-Young-Mirsky wainwright2019high vershynin2018high
...and 22 more

A representation-learning game for classes of prediction tasks

TL;DR

Abstract

A representation-learning game for classes of prediction tasks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (32)