Table of Contents
Fetching ...

L3Ms -- Lagrange Large Language Models

Guneet S. Dhillon, Xingjian Shi, Yee Whye Teh, Alex Smola

TL;DR

L3Ms reframes SFT and alignment as a unified constrained optimization problem, enabling application-specific guarantees on preference properties through average constraints instead of heuristic reward weighting. By incorporating a relaxed logarithmic barrier, the method gradually enforces constraints during fine-tuning, linking barrier gradients to Lagrange multipliers and avoiding deviations from an SFT anchor. The approach demonstrates that length-constrained and safety-oriented preferences (e.g., Helpful/Harmless) can be satisfied without sacrificing task performance, and with improved efficiency relative to saddle-point methods that rely on separate SFT models. Empirical results on instruction-following with UltraChat data show L3Ms can tailor responses (e.g., concise vs. verbose) while maintaining competitive perplexities, highlighting the practical impact of principled constraint-driven customization for LLM deployment.

Abstract

Supervised fine-tuning (SFT) and alignment of large language models (LLMs) are key steps in providing a good user experience. However, the concept of an appropriate alignment is inherently application-dependent, and current methods often rely on heuristic choices to drive optimization. In this work, we formulate SFT and alignment as a constrained optimization problem: the LLM is fine-tuned on a task while being required to meet application-specific requirements, without resorting to heuristics. To solve this, we propose Lagrange Large Language Models (L3Ms), which employ logarithmic barriers to enforce the constraints. This approach allows for the customization of L3Ms across diverse applications while avoiding heuristic-driven processes. We experimentally demonstrate the versatility and efficacy of L3Ms in achieving tailored alignments for various applications.

L3Ms -- Lagrange Large Language Models

TL;DR

L3Ms reframes SFT and alignment as a unified constrained optimization problem, enabling application-specific guarantees on preference properties through average constraints instead of heuristic reward weighting. By incorporating a relaxed logarithmic barrier, the method gradually enforces constraints during fine-tuning, linking barrier gradients to Lagrange multipliers and avoiding deviations from an SFT anchor. The approach demonstrates that length-constrained and safety-oriented preferences (e.g., Helpful/Harmless) can be satisfied without sacrificing task performance, and with improved efficiency relative to saddle-point methods that rely on separate SFT models. Empirical results on instruction-following with UltraChat data show L3Ms can tailor responses (e.g., concise vs. verbose) while maintaining competitive perplexities, highlighting the practical impact of principled constraint-driven customization for LLM deployment.

Abstract

Supervised fine-tuning (SFT) and alignment of large language models (LLMs) are key steps in providing a good user experience. However, the concept of an appropriate alignment is inherently application-dependent, and current methods often rely on heuristic choices to drive optimization. In this work, we formulate SFT and alignment as a constrained optimization problem: the LLM is fine-tuned on a task while being required to meet application-specific requirements, without resorting to heuristics. To solve this, we propose Lagrange Large Language Models (L3Ms), which employ logarithmic barriers to enforce the constraints. This approach allows for the customization of L3Ms across diverse applications while avoiding heuristic-driven processes. We experimentally demonstrate the versatility and efficacy of L3Ms in achieving tailored alignments for various applications.

Paper Structure

This paper contains 34 sections, 17 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: The relaxed logarithmic barrier. We depict the convergence of the relaxed logarithmic barrier $\mathcal{B}_{\mu , \mu^{2}} ( z )$ to the characteristic function $\chi \{ z \leq 0 \}$ as $\mu \rightarrow 0$. We gradually decrease $\mu$ from 1 (blue) to 0.01 (red). Consequently, $\mathcal{B}_{\mu , \mu^{2}} ( z )$ gets closer to 0 for $z \leq 0$ and increases to $\infty$ otherwise.
  • Figure 2: Length constrained L3Ms. We report the response lengths (in tokens) and task perplexities of the SFT model and the L3Ms with varying length constraints. Left: The mean response length with the mean and standard deviation of the task perplexities. Right: The distribution of the response lengths. The notches indicate the medians and their 95% confidence intervals, the boxes show the $\pm$25% quantiles, and the whiskers denote the 1.5$\times$ interquartile ranges. The white circles mark the means, and the black dashed lines depict the constraints imposed on the different L3Ms.
  • Figure 3: Helpful and harmless L3Ms. We report the helpful-harmless rewards and task perplexities achieved by the different LLMs. Left: The helpful-harmless rewards attained by the LLM at initialization (at the bottom-left in blue), the SFT model (at the top-left in orange), the MMs (in green), and the L3Ms (in red). We depict the imposed constraints in black, with the dotted gray lines connecting LLMs to their corresponding constraints. Note that constraints are satisfied if the obtained reward point is at the top-right of its corresponding constraint point. For example, the shaded region denotes the feasible region for the constraint point (3, 3), with the shade gradient denoting the distance from the constraint boundary (light to dark shows an increase in distance). Right: The mean and standard deviation of the task perplexities for MMs and L3Ms, along with their corresponding constraints; the task perplexity at initialization is 1.316$\pm$0.4 and that of the SFT model is 0.805$\pm$0.3.