Graph Flow Matching: Enhancing Image Generation with Neighbor-Aware Flow Fields

Md Shahriar Rahim Siddiqui; Moshe Eliasof; Eldad Haber

Graph Flow Matching: Enhancing Image Generation with Neighbor-Aware Flow Fields

Md Shahriar Rahim Siddiqui, Moshe Eliasof, Eldad Haber

TL;DR

Problem: improve flow matching for image generation by leveraging local neighborhood structure. Approach: Graph Flow Matching (GFM) decomposes the velocity field as $\mathbf{v}(\mathbf{x}, t) = \mathbf{v}_{\text{react}}(\mathbf{x}, t) + \mathbf{v}_{\text{diff}}(\mathbf{x}, t; \mathcal{N}(\mathbf{x}, t))$ and uses a graph neural module on latent codes to implement the diffusion term. Contributions: a modular, backbones-agnostic diffusion term instantiated via MPNN or GPS, validated in latent space with five datasets, achieving consistent FID/recall gains and only modest parameter overhead; no changes to training losses or solvers. Findings: across LSUN Church/Bedroom, FFHQ, AFHQ-Cat, and CelebA-HQ, GFM yields substantial quality improvements while preserving sampling efficiency; ablations confirm graph structure, not simply extra capacity, drives gains. Significance: demonstrates that integrating local geometric priors into continuous-time generative modeling yields robust, scalable improvements in high-fidelity image synthesis.

Abstract

Flow matching casts sample generation as learning a continuous-time velocity field that transports noise to data. Existing flow matching networks typically predict each point's velocity independently, considering only its location and time along its flow trajectory, and ignoring neighboring points. However, this pointwise approach may overlook correlations between points along the generation trajectory that could enhance velocity predictions, thereby improving downstream generation quality. To address this, we propose Graph Flow Matching (GFM), a lightweight enhancement that decomposes the learned velocity into a reaction term -- any standard flow matching network -- and a diffusion term that aggregates neighbor information via a graph neural module. This reaction-diffusion formulation retains the scalability of deep flow models while enriching velocity predictions with local context, all at minimal additional computational cost. Operating in the latent space of a pretrained variational autoencoder, GFM consistently improves Fréchet Inception Distance (FID) and recall across five image generation benchmarks (LSUN Church, LSUN Bedroom, FFHQ, AFHQ-Cat, and CelebA-HQ at $256\times256$), demonstrating its effectiveness as a modular enhancement to existing flow matching architectures.

Graph Flow Matching: Enhancing Image Generation with Neighbor-Aware Flow Fields

TL;DR

Problem: improve flow matching for image generation by leveraging local neighborhood structure. Approach: Graph Flow Matching (GFM) decomposes the velocity field as

and uses a graph neural module on latent codes to implement the diffusion term. Contributions: a modular, backbones-agnostic diffusion term instantiated via MPNN or GPS, validated in latent space with five datasets, achieving consistent FID/recall gains and only modest parameter overhead; no changes to training losses or solvers. Findings: across LSUN Church/Bedroom, FFHQ, AFHQ-Cat, and CelebA-HQ, GFM yields substantial quality improvements while preserving sampling efficiency; ablations confirm graph structure, not simply extra capacity, drives gains. Significance: demonstrates that integrating local geometric priors into continuous-time generative modeling yields robust, scalable improvements in high-fidelity image synthesis.

Abstract

), demonstrating its effectiveness as a modular enhancement to existing flow matching architectures.

Graph Flow Matching: Enhancing Image Generation with Neighbor-Aware Flow Fields

TL;DR

Abstract

Graph Flow Matching: Enhancing Image Generation with Neighbor-Aware Flow Fields

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)