Table of Contents
Fetching ...

Hyperparameter Optimization as a Service on INFN Cloud

Matteo Barbetti, Lucio Anderlini

TL;DR

Hyperparameter optimization for ML often requires coordinating many trial runs across heterogeneous resources. Hopaas provides a lightweight REST API-based service to orchestrate Bayesian optimization campaigns using Optuna, enabling distributed trials across multi-site HPC and cloud resources with pruning support. The reference INFN Cloud implementation demonstrates practical deployment with a Python client and is applied to tune ultra-fast LHCb simulations using GAN-based parameterizations on Marconi 100. The work showcases scalable coordination across providers and enhanced parameterizations for high-energy physics workflows, highlighting practical impact for large-scale HPO campaigns.

Abstract

The simplest and often most effective way of parallelizing the training of complex machine learning models is to execute several training instances on multiple machines, scanning the hyperparameter space to optimize the underlying statistical model and the learning procedure. Often, such a meta-learning procedure is limited by the ability of accessing securely a common database organizing the knowledge of the previous and ongoing trials. Exploiting opportunistic GPUs provided in different environments represents a further challenge when designing such optimization campaigns. In this contribution, we discuss how a set of REST APIs can be used to access a dedicated service based on INFN Cloud to monitor and coordinate multiple training instances, with gradient-less optimization techniques, via simple HTTP requests. The service, called Hopaas (Hyperparameter OPtimization As A Service), is made of a web interface and sets of APIs implemented with a FastAPI backend running through Uvicorn and NGINX in a virtual instance of INFN Cloud. The optimization algorithms are currently based on Bayesian techniques as provided by Optuna. A Python frontend is also made available for quick prototyping. We present applications to hyperparameter optimization campaigns performed by combining private, INFN Cloud, and CINECA resources. Such multi-node multi-site optimization studies have given a significant boost to the development of a set of parameterizations for the ultra-fast simulation of the LHCb experiment.

Hyperparameter Optimization as a Service on INFN Cloud

TL;DR

Hyperparameter optimization for ML often requires coordinating many trial runs across heterogeneous resources. Hopaas provides a lightweight REST API-based service to orchestrate Bayesian optimization campaigns using Optuna, enabling distributed trials across multi-site HPC and cloud resources with pruning support. The reference INFN Cloud implementation demonstrates practical deployment with a Python client and is applied to tune ultra-fast LHCb simulations using GAN-based parameterizations on Marconi 100. The work showcases scalable coordination across providers and enhanced parameterizations for high-energy physics workflows, highlighting practical impact for large-scale HPO campaigns.

Abstract

The simplest and often most effective way of parallelizing the training of complex machine learning models is to execute several training instances on multiple machines, scanning the hyperparameter space to optimize the underlying statistical model and the learning procedure. Often, such a meta-learning procedure is limited by the ability of accessing securely a common database organizing the knowledge of the previous and ongoing trials. Exploiting opportunistic GPUs provided in different environments represents a further challenge when designing such optimization campaigns. In this contribution, we discuss how a set of REST APIs can be used to access a dedicated service based on INFN Cloud to monitor and coordinate multiple training instances, with gradient-less optimization techniques, via simple HTTP requests. The service, called Hopaas (Hyperparameter OPtimization As A Service), is made of a web interface and sets of APIs implemented with a FastAPI backend running through Uvicorn and NGINX in a virtual instance of INFN Cloud. The optimization algorithms are currently based on Bayesian techniques as provided by Optuna. A Python frontend is also made available for quick prototyping. We present applications to hyperparameter optimization campaigns performed by combining private, INFN Cloud, and CINECA resources. Such multi-node multi-site optimization studies have given a significant boost to the development of a set of parameterizations for the ultra-fast simulation of the LHCb experiment.
Paper Structure (5 sections, 1 figure, 1 table)

This paper contains 5 sections, 1 figure, 1 table.

Figures (1)

  • Figure 1: Workflow of an optimization study with a client-server approach based on $\mathrm{REST~APIs}$.