Table of Contents
Fetching ...

Ensuring Reliability via Hyperparameter Selection: Review and Advances

Amirmohammad Farzaneh, Osvaldo Simeone

TL;DR

The work addresses reliable hyperparameter selection by casting it as a multiple hypothesis testing problem and formalizing the Learn-Then-Test (LTT) framework as a two-stage process that yields a subset of reliable hyperparameters with statistical guarantees. It surveys various risk measures, including $R_{\text{avg}}$ and $R_q$, and extends LTT to handle multi-objective criteria, side information via reliability graphs, and adaptive, sequential testing with e-values. Guarantees are provided through FWER control (Bonferroni, Fixed Sequence) and FDR control (DAGGER-based RG-PT), as well as Pareto-front based multi-objective testing. The framework is motivated by practical deployments in engineering domains such as communication systems, where formal risk bounds enable safer, more trustworthy hyperparameter choices.

Abstract

Hyperparameter selection is a critical step in the deployment of artificial intelligence (AI) models, particularly in the current era of foundational, pre-trained, models. By framing hyperparameter selection as a multiple hypothesis testing problem, recent research has shown that it is possible to provide statistical guarantees on population risk measures attained by the selected hyperparameter. This paper reviews the Learn-Then-Test (LTT) framework, which formalizes this approach, and explores several extensions tailored to engineering-relevant scenarios. These extensions encompass different risk measures and statistical guarantees, multi-objective optimization, the incorporation of prior knowledge and dependency structures into the hyperparameter selection process, as well as adaptivity. The paper also includes illustrative applications for communication systems.

Ensuring Reliability via Hyperparameter Selection: Review and Advances

TL;DR

The work addresses reliable hyperparameter selection by casting it as a multiple hypothesis testing problem and formalizing the Learn-Then-Test (LTT) framework as a two-stage process that yields a subset of reliable hyperparameters with statistical guarantees. It surveys various risk measures, including and , and extends LTT to handle multi-objective criteria, side information via reliability graphs, and adaptive, sequential testing with e-values. Guarantees are provided through FWER control (Bonferroni, Fixed Sequence) and FDR control (DAGGER-based RG-PT), as well as Pareto-front based multi-objective testing. The framework is motivated by practical deployments in engineering domains such as communication systems, where formal risk bounds enable safer, more trustworthy hyperparameter choices.

Abstract

Hyperparameter selection is a critical step in the deployment of artificial intelligence (AI) models, particularly in the current era of foundational, pre-trained, models. By framing hyperparameter selection as a multiple hypothesis testing problem, recent research has shown that it is possible to provide statistical guarantees on population risk measures attained by the selected hyperparameter. This paper reviews the Learn-Then-Test (LTT) framework, which formalizes this approach, and explores several extensions tailored to engineering-relevant scenarios. These extensions encompass different risk measures and statistical guarantees, multi-objective optimization, the incorporation of prior knowledge and dependency structures into the hyperparameter selection process, as well as adaptivity. The paper also includes illustrative applications for communication systems.

Paper Structure

This paper contains 8 sections, 13 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Packet delay distribution for LTT and QLTT in a resource allocation problem (adapted from farzaneh2024quantile by using a different calibration data set $\mathcal{Z}$).
  • Figure 2: Illustration of the operation of PT for one reliability risk function $R_1(\lambda)$ to be controlled below the threshold $\alpha$ and one auxiliary risk function $R_2(\lambda)$ to be minimized. (a) Find the subset $\Lambda_\text{OPT}$ lying on the estimated Pareto front; (b) Perform MHT via FST to identify the subset $\hat{\Lambda}_\mathcal{Z}$ of reliable hyperparameters.
  • Figure 3: An example of prior information on the relative reliability of hyperparameters. Hyperparameters corresponding to a higher energy consumption are considered more reliable, and put at a higher level in the RG. Only if these configurations are deemed reliable do we proceed to test hyperparameters with better energy efficiency, which may also improve other performance criteria such as fairness.
  • Figure 4: Comparison of (a) Bonferroni method, (b) FST, and (c) DAGGER testing methods.