Table of Contents
Fetching ...

Hyperparameter Trajectory Inference with Conditional Lagrangian Optimal Transport

Harry Amad, Mihaela van der Schaar

TL;DR

This work proposes an approach based on conditional Lagrangian optimal transport, jointly learning the Lagrangian function governing hyperparameter-induced dynamics along with the associated optimal transport maps and geodesics between observed marginals, which form the surrogate model.

Abstract

Neural networks (NNs) often have critical behavioural trade-offs that are set at design time with hyperparameters-such as reward weights in reinforcement learning or quantile targets in regression. Post-deployment, however, user preferences can evolve, making initial settings undesirable, necessitating potentially expensive retraining. To circumvent this, we introduce the task of Hyperparameter Trajectory Inference (HTI): to learn, from observed data, how a NN's conditional output distribution changes with its hyperparameters, and construct a surrogate model that approximates the NN at unobserved hyperparameter settings. HTI requires extending existing trajectory inference approaches to incorporate conditions, exacerbating the challenge of ensuring inferred paths are feasible. We propose an approach based on conditional Lagrangian optimal transport, jointly learning the Lagrangian function governing hyperparameter-induced dynamics along with the associated optimal transport maps and geodesics between observed marginals, which form the surrogate model. We incorporate inductive biases based on the manifold hypothesis and least-action principles into the learned Lagrangian, improving surrogate model feasibility. We empirically demonstrate that our approach reconstructs NN outputs across various hyperparameter spectra better than other alternatives.

Hyperparameter Trajectory Inference with Conditional Lagrangian Optimal Transport

TL;DR

This work proposes an approach based on conditional Lagrangian optimal transport, jointly learning the Lagrangian function governing hyperparameter-induced dynamics along with the associated optimal transport maps and geodesics between observed marginals, which form the surrogate model.

Abstract

Neural networks (NNs) often have critical behavioural trade-offs that are set at design time with hyperparameters-such as reward weights in reinforcement learning or quantile targets in regression. Post-deployment, however, user preferences can evolve, making initial settings undesirable, necessitating potentially expensive retraining. To circumvent this, we introduce the task of Hyperparameter Trajectory Inference (HTI): to learn, from observed data, how a NN's conditional output distribution changes with its hyperparameters, and construct a surrogate model that approximates the NN at unobserved hyperparameter settings. HTI requires extending existing trajectory inference approaches to incorporate conditions, exacerbating the challenge of ensuring inferred paths are feasible. We propose an approach based on conditional Lagrangian optimal transport, jointly learning the Lagrangian function governing hyperparameter-induced dynamics along with the associated optimal transport maps and geodesics between observed marginals, which form the surrogate model. We incorporate inductive biases based on the manifold hypothesis and least-action principles into the learned Lagrangian, improving surrogate model feasibility. We empirically demonstrate that our approach reconstructs NN outputs across various hyperparameter spectra better than other alternatives.
Paper Structure (53 sections, 37 equations, 6 figures, 15 tables, 2 algorithms)

This paper contains 53 sections, 37 equations, 6 figures, 15 tables, 2 algorithms.

Figures (6)

  • Figure 1: Dots represent true samples from the temporal process across $t\in[0,1]$, lines represent model estimated trajectories from $t=0$ to $t=1$. Each condition has a distinct colour.
  • Figure 2: Average surrogate Cancer reward across $\lambda_\text{nk} \in \{1,2,3,4,6,7,8,9\}$.
  • Figure 3: Central $80\%$ prediction intervals from HTI surrogates compared with the true intervals on randomly selected ETTm2 samples, for direct (top), CFM (second row), MFM (third row), and our (bottom) approach.
  • Figure 4: Example inference-time adjustment enabled by HTI. We illustrate disparate user preferences affecting desired NN behaviour (desired $\lambda$ level) for different users in this abstract example. Having a fixed number of trained NNs ($p_{\theta_{\lambda_i}}$) only allows partial exploration of the full hyperparameter trajectory, while an HTI surrogate model ($(\hat{p}(\cdot | x_i, \lambda))_{\lambda \in \Lambda}$) can estimate outputs across the entire spectrum of hyperparameters (estimated conditional probability paths represented by solid blue/red lines). Crucially, hyperparameter-induced dynamics can differ amongst input conditions ($x_i$), as the true conditional distributions move along their own respective manifolds ($\mathcal{M}_{x_i}$), so an effective HTI model must learn conditional dynamics.
  • Figure 5: MSE of surrogate ETTm2 forecasts compared to NNs trained across quantiles $\tau \in \{0.1,0.25,0.5,0.75,0.9\}$.
  • ...and 1 more figures