Table of Contents
Fetching ...

TaskMet: Task-Driven Metric Learning for Model Learning

Dishank Bansal, Ricky T. Q. Chen, Mustafa Mukadam, Brandon Amos

TL;DR

This work proposes to take the task loss signal one level deeper than the parameters of the model and use it to learn theparameters of the loss function the model is trained on, which can be done by learning a metric in the prediction space.

Abstract

Deep learning models are often deployed in downstream tasks that the training procedure may not be aware of. For example, models solely trained to achieve accurate predictions may struggle to perform well on downstream tasks because seemingly small prediction errors may incur drastic task errors. The standard end-to-end learning approach is to make the task loss differentiable or to introduce a differentiable surrogate that the model can be trained on. In these settings, the task loss needs to be carefully balanced with the prediction loss because they may have conflicting objectives. We propose take the task loss signal one level deeper than the parameters of the model and use it to learn the parameters of the loss function the model is trained on, which can be done by learning a metric in the prediction space. This approach does not alter the optimal prediction model itself, but rather changes the model learning to emphasize the information important for the downstream task. This enables us to achieve the best of both worlds: a prediction model trained in the original prediction space while also being valuable for the desired downstream task. We validate our approach through experiments conducted in two main settings: 1) decision-focused model learning scenarios involving portfolio optimization and budget allocation, and 2) reinforcement learning in noisy environments with distracting states. The source code to reproduce our experiments is available at https://github.com/facebookresearch/taskmet

TaskMet: Task-Driven Metric Learning for Model Learning

TL;DR

This work proposes to take the task loss signal one level deeper than the parameters of the model and use it to learn theparameters of the loss function the model is trained on, which can be done by learning a metric in the prediction space.

Abstract

Deep learning models are often deployed in downstream tasks that the training procedure may not be aware of. For example, models solely trained to achieve accurate predictions may struggle to perform well on downstream tasks because seemingly small prediction errors may incur drastic task errors. The standard end-to-end learning approach is to make the task loss differentiable or to introduce a differentiable surrogate that the model can be trained on. In these settings, the task loss needs to be carefully balanced with the prediction loss because they may have conflicting objectives. We propose take the task loss signal one level deeper than the parameters of the model and use it to learn the parameters of the loss function the model is trained on, which can be done by learning a metric in the prediction space. This approach does not alter the optimal prediction model itself, but rather changes the model learning to emphasize the information important for the downstream task. This enables us to achieve the best of both worlds: a prediction model trained in the original prediction space while also being valuable for the desired downstream task. We validate our approach through experiments conducted in two main settings: 1) decision-focused model learning scenarios involving portfolio optimization and budget allocation, and 2) reinforcement learning in noisy environments with distracting states. The source code to reproduce our experiments is available at https://github.com/facebookresearch/taskmet
Paper Structure (18 sections, 1 theorem, 12 equations, 10 figures, 5 tables, 1 algorithm)

This paper contains 18 sections, 1 theorem, 12 equations, 10 figures, 5 tables, 1 algorithm.

Key Result

Theorem 1

Let $f: \mathbb{R}^n \times \mathbb{R}^m \rightarrow \mathbb{R}^n$ be a continuous differentiable function, and let $x^\star, y^\star$ be a point satisfying $f(x^\star, y^\star)= 0$. If the Jacobian $\frac{\partial f(x^\star, y^\star)}{\partial y}$ is non-singular, then there exists an open set arou

Figures (10)

  • Figure 1: The MSE results in a model close to the true model in the prediction space, but may give poor task performance. Decision-focused learning (DFL) methods optimize the task loss, but may deviate from the prediction space. TaskMet optimizes the task loss while retaining the prediction task.
  • Figure 2: Examples of the Mahalanobis loss from \ref{['eq:implicit_theta']} in a 2-dimensional prediction task. The model's loss is zero only when $\hat{y}=y^\star$. Here, the metric $\Lambda_\phi(x)$ increases the weighting on the $y_0$ component of the loss and thus emphasizes the predictions along this dimension.
  • Figure 3: TaskMet learns a metric for predictions with the gradient from a downstream task loss.
  • Figure 4: (Cubic problem) TaskMet learns a metric that prioritizes points that are the most important the downstream task. The euclidean metric (MSE) puts equal weight on all points and leads to a bad model with respect to the downstream task.
  • Figure 5: OMD nikishin2022control uses the planning task loss to learn the model parameters using implicit gradients. TaskMet add one more optimization step over OMD and instead of learning the model parameters using task loss, we learn the metric which then is used to learn model parameters.
  • ...and 5 more figures

Theorems & Definitions (1)

  • Theorem 1: Implicit Function Theorem