Table of Contents
Fetching ...

Reweighted Flow Matching via Unbalanced OT for Label-free Long-tailed Generation

Hyunsoo Song, Minjung Gim, Jaewoong Choi

TL;DR

This work tackles the challenge of generating balanced data under long-tailed, label-free settings by integrating Unbalanced Optimal Transport into flow matching. The core idea is to build a mini-batch UOT coupling to construct a majority score and then reweight the flow-matching objective with an inverse power of this score, enabling first-order recovery of the target distribution and improved tail generation with higher-order corrections. Theoretical analysis shows bias in standard UOT-based flow matching and how the proposed reweighting corrects it, while extensive experiments on CIFAR-10/100 LT datasets demonstrate superior tail fidelity, class-proportion recovery, and competitive performance on balanced data. The approach offers a practical, label-free pathway to mitigate majority bias in continuous-time generative modeling with minimal training overhead and strong empirical gains. A potential extension is to combine UOT-RFM with test-time guidance for further improvements without retraining.

Abstract

Flow matching has recently emerged as a powerful framework for continuous-time generative modeling. However, when applied to long-tailed distributions, standard flow matching suffers from majority bias, producing minority modes with low fidelity and failing to match the true class proportions. In this work, we propose Unbalanced Optimal Transport Reweighted Flow Matching (UOT-RFM), a novel framework for generative modeling under class-imbalanced (long-tailed) distributions that operates without any class label information. Our method constructs the conditional vector field using mini-batch Unbalanced Optimal Transport (UOT) and mitigates majority bias through a principled inverse reweighting strategy. The reweighting relies on a label-free majority score, defined as the density ratio between the target distribution and the UOT marginal. This score quantifies the degree of majority based on the geometric structure of the data, without requiring class labels. By incorporating this score into the training objective, UOT-RFM theoretically recovers the target distribution with first-order correction ($k=1$) and empirically improves tail-class generation through higher-order corrections ($k > 1$). Our model outperforms existing flow matching baselines on long-tailed benchmarks, while maintaining competitive performance on balanced datasets.

Reweighted Flow Matching via Unbalanced OT for Label-free Long-tailed Generation

TL;DR

This work tackles the challenge of generating balanced data under long-tailed, label-free settings by integrating Unbalanced Optimal Transport into flow matching. The core idea is to build a mini-batch UOT coupling to construct a majority score and then reweight the flow-matching objective with an inverse power of this score, enabling first-order recovery of the target distribution and improved tail generation with higher-order corrections. Theoretical analysis shows bias in standard UOT-based flow matching and how the proposed reweighting corrects it, while extensive experiments on CIFAR-10/100 LT datasets demonstrate superior tail fidelity, class-proportion recovery, and competitive performance on balanced data. The approach offers a practical, label-free pathway to mitigate majority bias in continuous-time generative modeling with minimal training overhead and strong empirical gains. A potential extension is to combine UOT-RFM with test-time guidance for further improvements without retraining.

Abstract

Flow matching has recently emerged as a powerful framework for continuous-time generative modeling. However, when applied to long-tailed distributions, standard flow matching suffers from majority bias, producing minority modes with low fidelity and failing to match the true class proportions. In this work, we propose Unbalanced Optimal Transport Reweighted Flow Matching (UOT-RFM), a novel framework for generative modeling under class-imbalanced (long-tailed) distributions that operates without any class label information. Our method constructs the conditional vector field using mini-batch Unbalanced Optimal Transport (UOT) and mitigates majority bias through a principled inverse reweighting strategy. The reweighting relies on a label-free majority score, defined as the density ratio between the target distribution and the UOT marginal. This score quantifies the degree of majority based on the geometric structure of the data, without requiring class labels. By incorporating this score into the training objective, UOT-RFM theoretically recovers the target distribution with first-order correction () and empirically improves tail-class generation through higher-order corrections (). Our model outperforms existing flow matching baselines on long-tailed benchmarks, while maintaining competitive performance on balanced datasets.

Paper Structure

This paper contains 38 sections, 6 theorems, 24 equations, 17 figures, 13 tables, 1 algorithm.

Key Result

Theorem 3.1

Let $\pi_{\tau}^{u}$ be the optimal source-fixed UOT coupling between $\mu$ and $\nu$ with $\tau_{2} = \tau > 0$ and assume that its target marginal satisfies $\nu \ll \pi_{\tau, 1}^{u}$, i.e., $\nu$ is absolutely continuous w.r.t. $\pi^u_{\tau,1}$. Training a flow matching model with $\pi_{\tau}^{u More generally, UOR-RFM with correction order $k$ generates $p_1 \propto s_{\tau}^{-k} \pi^{u}_{\ta

Figures (17)

  • Figure 1: Comparison of the target data distribution $\nu$ and the reweighted marginals $s_{\tau}^{-k_{1}} \pi_{\tau, 1}^{u}$ from UOT-RFM with correction order $k$ (where $1 < k_{1} < k_{2}$). The UOT marginal $\pi_{\tau, 1}^{u}$ downweights the minority classes. UOT-RFM adaptively upweights minority modes via the majority score $s_{\tau}$.
  • Figure 2: Example of majority score $s_{\tau}$ computed via mini-batch UOT. The source distribution is standard Gaussian $\mathcal{N}(0, I)$, and the target distribution is a Gaussian mixture (top). The majority scores (bottom) are higher in majority regions and lower in minority regions.
  • Figure 3: Qualitative comparison of generated samples from flow matching models trained on CIFAR-10-LT with imbalance ratio $\mathcal{I} = 0.01$. UOT-RFM produces more diverse images compared to other baselines.
  • Figure 4: Generated class distribution on CIFAR-10-LT with $\mathcal{I}=0.01$. The average Normalized Class Ratio Errors (NCREs) are: I-CFM = 0.84, OT-CFM = 1.02, and ours = 0.40.
  • Figure 5: Qualitative comparison of generated tail samples from flow matching models trained on CIFAR-10-LT with $\mathcal{I} = 0.01$. Samples with the highest confidence scores (as predicted by a pretrained classifier) are visualized.
  • ...and 12 more figures

Theorems & Definitions (7)

  • Theorem 3.1
  • Theorem B.1: uotmsemi-dual3uot-semidual
  • Lemma C.1: tong2024improving, Theorem 3.1
  • Lemma C.2: tong2024improving, Theorem 3.2
  • Lemma C.3: tong2024improving, Proposition 3.4
  • Theorem C.4: Theorem \ref{['thm:bias_correction']}
  • proof