Table of Contents
Fetching ...

Efficient Tail-Aware Generative Optimization via Flow Model Fine-Tuning

Zifan Wang, Riccardo De Santi, Xiaoyu Mo, Michael M. Zavlanos, Andreas Krause, Karl H. Johansson

TL;DR

This work presents Tail-aware Flow Fine-Tuning (TFFT), a principled and efficient distributional fine-tuning algorithm based on the Conditional Value-at-Risk (CVaR), and addresses two distinct tail-shaping goals: right-CVaR for seeking novel samples in the high-reward tail and left-CVaR for controlling worst-case samples in the low-reward tail.

Abstract

Fine-tuning pre-trained diffusion and flow models to optimize downstream utilities is central to real-world deployment. Existing entropy-regularized methods primarily maximize expected reward, providing no mechanism to shape tail behavior. However, tail control is often essential: the lower tail determines reliability by limiting low-reward failures, while the upper tail enables discovery by prioritizing rare, high-reward outcomes. In this work, we present Tail-aware Flow Fine-Tuning (TFFT), a principled and efficient distributional fine-tuning algorithm based on the Conditional Value-at-Risk (CVaR). We address two distinct tail-shaping goals: right-CVaR for seeking novel samples in the high-reward tail and left-CVaR for controlling worst-case samples in the low-reward tail. Unlike prior approaches that rely on non-linear optimization, we leverage the variational dual formulation of CVaR to decompose it into a decoupled two-stage procedure: a lightweight one-dimensional threshold optimization step, and a single entropy-regularized fine-tuning process via a specific pseudo-reward. This decomposition achieves CVaR fine-tuning efficiently with computational cost comparable to standard expected fine-tuning methods. We demonstrate the effectiveness of TFFT across illustrative experiments, high-dimensional text-to-image generation, and molecular design.

Efficient Tail-Aware Generative Optimization via Flow Model Fine-Tuning

TL;DR

This work presents Tail-aware Flow Fine-Tuning (TFFT), a principled and efficient distributional fine-tuning algorithm based on the Conditional Value-at-Risk (CVaR), and addresses two distinct tail-shaping goals: right-CVaR for seeking novel samples in the high-reward tail and left-CVaR for controlling worst-case samples in the low-reward tail.

Abstract

Fine-tuning pre-trained diffusion and flow models to optimize downstream utilities is central to real-world deployment. Existing entropy-regularized methods primarily maximize expected reward, providing no mechanism to shape tail behavior. However, tail control is often essential: the lower tail determines reliability by limiting low-reward failures, while the upper tail enables discovery by prioritizing rare, high-reward outcomes. In this work, we present Tail-aware Flow Fine-Tuning (TFFT), a principled and efficient distributional fine-tuning algorithm based on the Conditional Value-at-Risk (CVaR). We address two distinct tail-shaping goals: right-CVaR for seeking novel samples in the high-reward tail and left-CVaR for controlling worst-case samples in the low-reward tail. Unlike prior approaches that rely on non-linear optimization, we leverage the variational dual formulation of CVaR to decompose it into a decoupled two-stage procedure: a lightweight one-dimensional threshold optimization step, and a single entropy-regularized fine-tuning process via a specific pseudo-reward. This decomposition achieves CVaR fine-tuning efficiently with computational cost comparable to standard expected fine-tuning methods. We demonstrate the effectiveness of TFFT across illustrative experiments, high-dimensional text-to-image generation, and molecular design.
Paper Structure (65 sections, 8 theorems, 95 equations, 20 figures, 5 tables, 2 algorithms)

This paper contains 65 sections, 8 theorems, 95 equations, 20 figures, 5 tables, 2 algorithms.

Key Result

Theorem 4.1

Let $\beta \in (0,1)$ and $\alpha > 0$. Assume that the reward $r(x)$ is bounded. Then, the right-CVaR fine-tuning problem admits the equivalent reformulation: Moreover, let $t^*$ be the optimal threshold minimizing eq:cvar_dual_form. The optimal distribution maximizing eq:cvar_dual_form_2 is given by Finally, it satisfies that $\mathrm{VaR}_\beta(p_R^\star)=t^\star$.

Figures (20)

  • Figure 1: Illustration of right-CVaR and left-CVaR.
  • Figure 2: Evolution of distributions $p^k$ under the FDC update rule \ref{['eq:update_final']} in the 2D example. The sequence $p^k$ converges to our characterized target distribution $p_R^*$ defined in \ref{['eq:optimal_distribution']}, which empirically validates our theory and confirms that TFFT bypasses the expensive iterative loop.
  • Figure 3: Probability Density Functions (PDFs) in the 2D example comparing Pre-trained, EXP-FT, and TFFT. (Top, right-CVaR, $\beta=0.8$): R-TFFT concentrates probability mass in the high-reward tail, while EXP-FT shifts the mean. (Bottom, left-CVaR, $\beta=0.2$): L-TFFT successfully truncates the lower tail, removing worst-case failures that persist in the EXP-FT distribution.
  • Figure 4: Qualitative comparison of generated samples for the prompt "A tree with purple leaves in a green forest". Each image is labeled with its ImageReward score. EXP-FT and FDC suffer from high variance and occasional failures. In contrast, L-TFFT produces consistently high-quality samples with a minimum score of 0.75, validating its ability to control the worst-case tail.
  • Figure 5: Visualization of the top-3 molecules with the highest reward found by each method. R-TFFT discovers candidates with significantly higher stability (scores up to 5059) compared to the best candidates from EXP-FT (1069) and FDC (2012), validating its ability to explore the extreme high-reward tail.
  • ...and 15 more figures

Theorems & Definitions (10)

  • Theorem 4.1
  • Theorem 4.2
  • Theorem 5.1
  • Proposition 5.2
  • Theorem 6.1: Convergence of Stage 1, Informal
  • Theorem 6.2: Sensitivity to Threshold Estimation
  • Lemma 1.1: Bias and Variance of Ratio Estimators.
  • proof
  • Lemma 1.2
  • proof