Table of Contents
Fetching ...

A representation-learning game for classes of prediction tasks

Neria Uzan, Nir Weinberger

TL;DR

A game-based formulation for learning dimensionality-reducing representations of feature vectors, when only a prior knowledge on future prediction tasks is available, and an efficient algorithm to optimize a randomized representation is proposed.

Abstract

We propose a game-based formulation for learning dimensionality-reducing representations of feature vectors, when only a prior knowledge on future prediction tasks is available. In this game, the first player chooses a representation, and then the second player adversarially chooses a prediction task from a given class, representing the prior knowledge. The first player aims is to minimize, and the second player to maximize, the regret: The minimal prediction loss using the representation, compared to the same loss using the original features. For the canonical setting in which the representation, the response to predict and the predictors are all linear functions, and under the mean squared error loss function, we derive the theoretically optimal representation in pure strategies, which shows the effectiveness of the prior knowledge, and the optimal regret in mixed strategies, which shows the usefulness of randomizing the representation. For general representations and loss functions, we propose an efficient algorithm to optimize a randomized representation. The algorithm only requires the gradients of the loss function, and is based on incrementally adding a representation rule to a mixture of such rules.

A representation-learning game for classes of prediction tasks

TL;DR

A game-based formulation for learning dimensionality-reducing representations of feature vectors, when only a prior knowledge on future prediction tasks is available, and an efficient algorithm to optimize a randomized representation is proposed.

Abstract

We propose a game-based formulation for learning dimensionality-reducing representations of feature vectors, when only a prior knowledge on future prediction tasks is available. In this game, the first player chooses a representation, and then the second player adversarially chooses a prediction task from a given class, representing the prior knowledge. The first player aims is to minimize, and the second player to maximize, the regret: The minimal prediction loss using the representation, compared to the same loss using the original features. For the canonical setting in which the representation, the response to predict and the predictors are all linear functions, and under the mean squared error loss function, we derive the theoretically optimal representation in pure strategies, which shows the effectiveness of the prior knowledge, and the optimal regret in mixed strategies, which shows the usefulness of randomizing the representation. For general representations and loss functions, we propose an efficient algorithm to optimize a randomized representation. The algorithm only requires the gradients of the loss function, and is based on incrementally adding a representation rule to a mixture of such rules.
Paper Structure (24 sections, 14 theorems, 133 equations, 7 figures, 4 tables)

This paper contains 24 sections, 14 theorems, 133 equations, 7 figures, 4 tables.

Key Result

Theorem 2

For the linear MSE setting (Definition def: linear MSE) A minimax representation matrix is and the worst case response function is

Figures (7)

  • Figure 1: Left: Pure and mixed minimax regret and $\ell_{*}$ for Example \ref{['exa: Identity of response weights']}, for $d=50,r=25$, with $\text{$\lambda_{i}=\sigma_{i}^{2}\propto i^{-\alpha}$}$. Right: Pure and mixed minimax regret and $\ell_{*}$ for Example \ref{['exa: diagonal case']}, for $d=50,r=25$, with $\text{$\sigma_{i}^{2}\propto i^{-\alpha}$ and $s_{i}\propto i^{2}$}$. The trend of $\ell_{*}$ is reversed for $\alpha>2$.
  • Figure 2: Results of Algorithm \ref{['alg: Iterative algorithm']}. Left: $r=5$, varying $d$. The ratio between the regret achieved by Algorithm \ref{['alg: Iterative algorithm']} and the theoretical regret in the linear MSE setting. Right: $r=3$, varying $d$. The regret achieved by Algorithm \ref{['alg: Iterative algorithm']} in the linear cross-entropy setting, various $m$.
  • Figure 3: Results on the dataset of images. Comparison between optimized minimax representation (simplified version of Algorithm \ref{['alg: Iterative algorithm']}) vs. PCA. Worst-case function in blue, and average-case function in orange. Left: Cross entropy loss. Right: Accuracy.
  • Figure 4: The learning curve for Algorithm \ref{['alg: Iterative algorithm']} in the linear MSE setting: $d=20$, $r=3$, $\sigma=1$.
  • Figure 5: The ratio between the regret achieved by Algorithm \ref{['alg: Iterative algorithm']} and the theoretical regret in the linear MSE setting. Left: $d=20$, $\sigma_{0}=1$, varying $r$. Right: $r=5$, $d=20$, varying $\sigma_{0}$.
  • ...and 2 more figures

Theorems & Definitions (32)

  • Definition 1: The linear MSE setting
  • Theorem 2
  • Theorem 3
  • Example 4
  • Example 5
  • Example 6
  • Definition 7: The linear cross-entropy setting
  • Example 8
  • Example 9: Comparison with PCA for multi-label Classification
  • Theorem 10: Eckart-Young-Mirsky wainwright2019high vershynin2018high
  • ...and 22 more