Efficient Tail-Aware Generative Optimization via Flow Model Fine-Tuning

Zifan Wang; Riccardo De Santi; Xiaoyu Mo; Michael M. Zavlanos; Andreas Krause; Karl H. Johansson

Efficient Tail-Aware Generative Optimization via Flow Model Fine-Tuning

Zifan Wang, Riccardo De Santi, Xiaoyu Mo, Michael M. Zavlanos, Andreas Krause, Karl H. Johansson

TL;DR

This work presents Tail-aware Flow Fine-Tuning (TFFT), a principled and efficient distributional fine-tuning algorithm based on the Conditional Value-at-Risk (CVaR), and addresses two distinct tail-shaping goals: right-CVaR for seeking novel samples in the high-reward tail and left-CVaR for controlling worst-case samples in the low-reward tail.

Abstract

Fine-tuning pre-trained diffusion and flow models to optimize downstream utilities is central to real-world deployment. Existing entropy-regularized methods primarily maximize expected reward, providing no mechanism to shape tail behavior. However, tail control is often essential: the lower tail determines reliability by limiting low-reward failures, while the upper tail enables discovery by prioritizing rare, high-reward outcomes. In this work, we present Tail-aware Flow Fine-Tuning (TFFT), a principled and efficient distributional fine-tuning algorithm based on the Conditional Value-at-Risk (CVaR). We address two distinct tail-shaping goals: right-CVaR for seeking novel samples in the high-reward tail and left-CVaR for controlling worst-case samples in the low-reward tail. Unlike prior approaches that rely on non-linear optimization, we leverage the variational dual formulation of CVaR to decompose it into a decoupled two-stage procedure: a lightweight one-dimensional threshold optimization step, and a single entropy-regularized fine-tuning process via a specific pseudo-reward. This decomposition achieves CVaR fine-tuning efficiently with computational cost comparable to standard expected fine-tuning methods. We demonstrate the effectiveness of TFFT across illustrative experiments, high-dimensional text-to-image generation, and molecular design.

Efficient Tail-Aware Generative Optimization via Flow Model Fine-Tuning

TL;DR

Abstract

Paper Structure (65 sections, 8 theorems, 95 equations, 20 figures, 5 tables, 2 algorithms)

This paper contains 65 sections, 8 theorems, 95 equations, 20 figures, 5 tables, 2 algorithms.

Introduction
Preliminaries and Notation
Generative Flow and Diffusion Models
Entropy-Regularized Flow Fine-Tuning
Problem Statement: Tail-Aware Generative Optimization via Flow Model Fine-Tuning
Variational Dual Formulation of CVaR for Efficient Tail-Aware Flow Fine-Tuning
Algorithm: Tail-aware Flow Fine-Tuning
Comparison to non-linear generative optimization
Theoretical Analysis
Analysis of Stage 1
Analysis of Stage 2
Experiments
Illustrative Settings
Efficiency Discussion.
Image Generation
...and 50 more sections

Key Result

Theorem 4.1

Let $\beta \in (0,1)$ and $\alpha > 0$. Assume that the reward $r(x)$ is bounded. Then, the right-CVaR fine-tuning problem admits the equivalent reformulation: Moreover, let $t^*$ be the optimal threshold minimizing eq:cvar_dual_form. The optimal distribution maximizing eq:cvar_dual_form_2 is given by Finally, it satisfies that $\mathrm{VaR}_\beta(p_R^\star)=t^\star$.

Figures (20)

Figure 1: Illustration of right-CVaR and left-CVaR.
Figure 2: Evolution of distributions $p^k$ under the FDC update rule \ref{['eq:update_final']} in the 2D example. The sequence $p^k$ converges to our characterized target distribution $p_R^*$ defined in \ref{['eq:optimal_distribution']}, which empirically validates our theory and confirms that TFFT bypasses the expensive iterative loop.
Figure 3: Probability Density Functions (PDFs) in the 2D example comparing Pre-trained, EXP-FT, and TFFT. (Top, right-CVaR, $\beta=0.8$): R-TFFT concentrates probability mass in the high-reward tail, while EXP-FT shifts the mean. (Bottom, left-CVaR, $\beta=0.2$): L-TFFT successfully truncates the lower tail, removing worst-case failures that persist in the EXP-FT distribution.
Figure 4: Qualitative comparison of generated samples for the prompt "A tree with purple leaves in a green forest". Each image is labeled with its ImageReward score. EXP-FT and FDC suffer from high variance and occasional failures. In contrast, L-TFFT produces consistently high-quality samples with a minimum score of 0.75, validating its ability to control the worst-case tail.
Figure 5: Visualization of the top-3 molecules with the highest reward found by each method. R-TFFT discovers candidates with significantly higher stability (scores up to 5059) compared to the best candidates from EXP-FT (1069) and FDC (2012), validating its ability to explore the extreme high-reward tail.
...and 15 more figures

Theorems & Definitions (10)

Theorem 4.1
Theorem 4.2
Theorem 5.1
Proposition 5.2
Theorem 6.1: Convergence of Stage 1, Informal
Theorem 6.2: Sensitivity to Threshold Estimation
Lemma 1.1: Bias and Variance of Ratio Estimators.
proof
Lemma 1.2
proof

Efficient Tail-Aware Generative Optimization via Flow Model Fine-Tuning

TL;DR

Abstract

Efficient Tail-Aware Generative Optimization via Flow Model Fine-Tuning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (20)

Theorems & Definitions (10)