Importance-Weighted Non-IID Sampling for Flow Matching Models
Xinshuang Liu, Runfa Blark Li, Shaoxiu Wei, Truong Nguyen
TL;DR
This work tackles reliable estimation of functionals under flow-matching models when sampling budgets are tight, where IID sampling yields high variance. It introduces an importance-weighted non-IID joint sampling framework that jointly evolves multiple trajectories with a score-based diversity velocity to cover diverse, high-density regions while staying on the data manifold, and a learned residual velocity $r_\phi$ to model the non-IID marginal for unbiased weighting. The key contributions are (i) score-based regularization that preserves on-manifold diversity, (ii) the first method to learn a residual flow for unbiased importance weights in non-IID sampling, and (iii) comprehensive empirical validation showing improved diversity, sample quality, and accurate weight/expectation estimates on Gaussian mixtures and downstream tasks like text-to-image generation and image inpainting. The approach enables more reliable characterization of flow-matching model outputs under fixed budgets, with practical implications for downstream AI systems that rely on accurate distributional expectations.
Abstract
Flow-matching models effectively represent complex distributions, yet estimating expectations of functions of their outputs remains challenging under limited sampling budgets. Independent sampling often yields high-variance estimates, especially when rare but with high-impact outcomes dominate the expectation. We propose an importance-weighted non-IID sampling framework that jointly draws multiple samples to cover diverse, salient regions of a flow's distribution while maintaining unbiased estimation via estimated importance weights. To balance diversity and quality, we introduce a score-based regularization for the diversity mechanism, which uses the score function, i.e., the gradient of the log probability, to ensure samples are pushed apart within high-density regions of the data manifold, mitigating off-manifold drift. We further develop the first approach for importance weighting of non-IID flow samples by learning a residual velocity field that reproduces the marginal distribution of the non-IID samples. Empirically, our method produces diverse, high-quality samples and accurate estimates of both importance weights and expectations, advancing the reliable characterization of flow-matching model outputs. Our code will be publicly available on GitHub.
