Hyperparameter Optimization for Large Language Model Instruction-Tuning

Christophe Tribes; Sacha Benarroch-Lelong; Peng Lu; Ivan Kobyzev

Hyperparameter Optimization for Large Language Model Instruction-Tuning

Christophe Tribes, Sacha Benarroch-Lelong, Peng Lu, Ivan Kobyzev

TL;DR

This paper investigates automatic hyperparameter optimization for LoRA-based instruction-tuning of a moderate-sized LLM (LLaMA-2 7B) under a fixed compute budget. It compares two black-box optimization approaches—MADS/NOMAD and Bayesian TPE (NNI-TPE)—across a tuned space including LoRA rank $r$, scaling $α$, dropout $d$, and learning rate $lr$, using validation loss as the objective. The results show NOMAD often yields the best-performing models on downstream tasks and human preferences, while NNI-TPE provides broader exploration and strong low-rank configurations; however, lower validation loss does not always translate to higher scores on all benchmarks. The findings argue for careful, automated HPO as a practical outer loop in instruction-tuning, and suggest future work on multiobjective or constraint-aware formulations to better balance optimization objectives with real-world evaluation metrics.

Abstract

The fine-tuning of Large Language Models (LLMs) has enabled them to recently achieve milestones in natural language processing applications. The emergence of ever larger LLMs has paved the way for more efficient fine-tuning methods. Among these, the Low-Rank Adaptation (LoRA) method keeps most of the weights of the pre-trained LLM frozen while introducing a low-rank decomposition of the weight matrix, enabling the tuning of only a very small proportion of the network. The performance on downstream tasks of models fine-tuned with LoRA heavily relies on a set of hyperparameters including the rank of the decomposition. In this work, we investigate the choice of these hyperparameters through two main blackbox optimization (BBO) techniques. We examine the whole pipeline of performing fine-tuning and validation on a pre-trained LLM as a blackbox and efficiently explore the space of hyperparameters with the \nomad algorithm, achieving a boost in performance and human alignment of the tuned model.

Hyperparameter Optimization for Large Language Model Instruction-Tuning

TL;DR

, scaling

, dropout

, and learning rate

, using validation loss as the objective. The results show NOMAD often yields the best-performing models on downstream tasks and human preferences, while NNI-TPE provides broader exploration and strong low-rank configurations; however, lower validation loss does not always translate to higher scores on all benchmarks. The findings argue for careful, automated HPO as a practical outer loop in instruction-tuning, and suggest future work on multiobjective or constraint-aware formulations to better balance optimization objectives with real-world evaluation metrics.

Abstract

Paper Structure (24 sections, 1 equation, 3 figures, 4 tables)

This paper contains 24 sections, 1 equation, 3 figures, 4 tables.

Introduction
Instruction-tuning Large Language Model
Parameter-Efficient Fine-Tuning (PEFT)
Hyperparameters Optimization
The Mads algorithm and NOMAD
Neural Network Intelligence (NNI) toolkit
Experimental Setup
Instruction-tuning Settings
Backbone Model
Datasets
Training Details
BBO Settings
Experimental Results
First optimization round
Second optimization round
...and 9 more sections

Figures (3)

Figure 1: Objective value history. First NOMAD optimization with 50 evaluations and a 3 epochs fine-tuning.
Figure 2: Parallel plots showing hyperparameters values and validation losses. Darker lines indicate lower validation losses.
Figure 3: Human evaluation on the Vicuna human preference dataset.

Hyperparameter Optimization for Large Language Model Instruction-Tuning

TL;DR

Abstract

Hyperparameter Optimization for Large Language Model Instruction-Tuning

Authors

TL;DR

Abstract

Table of Contents

Figures (3)