Table of Contents
Fetching ...

Symmetrical Flow Matching: Unified Image Generation, Segmentation, and Classification with Score-Based Generative Models

Francisco Caetano, Christiaan Viviers, Peter H. N. De With, Fons van der Sommen

TL;DR

This work addresses the challenge of unifying semantic segmentation, classification, and image generation within a single model. It introduces Symmetrical Flow Matching (SymmFlow), a bidirectional, symmetric flow framework that jointly models forward and reverse transformations between images and semantic representations, while preserving entropy for diverse generation. The approach enables one-pass segmentation and classification conditioned on flexible semantic inputs, and demonstrates state-of-the-art semantic image synthesis with only 25 inference steps, alongside competitive segmentation and promising classification results. Overall, SymmFlow offers a practical, versatile framework that narrows the gap between discriminative and generative modeling in vision tasks, with potential for extensions to depth estimation and broader conditioning modalities.

Abstract

Flow Matching has emerged as a powerful framework for learning continuous transformations between distributions, enabling high-fidelity generative modeling. This work introduces Symmetrical Flow Matching (SymmFlow), a new formulation that unifies semantic segmentation, classification, and image generation within a single model. Using a symmetric learning objective, SymmFlow models forward and reverse transformations jointly, ensuring bi-directional consistency, while preserving sufficient entropy for generative diversity. A new training objective is introduced to explicitly retain semantic information across flows, featuring efficient sampling while preserving semantic structure, allowing for one-step segmentation and classification without iterative refinement. Unlike previous approaches that impose strict one-to-one mapping between masks and images, SymmFlow generalizes to flexible conditioning, supporting both pixel-level and image-level class labels. Experimental results on various benchmarks demonstrate that SymmFlow achieves state-of-the-art performance on semantic image synthesis, obtaining FID scores of 11.9 on CelebAMask-HQ and 7.0 on COCO-Stuff with only 25 inference steps. Additionally, it delivers competitive results on semantic segmentation and shows promising capabilities in classification tasks.

Symmetrical Flow Matching: Unified Image Generation, Segmentation, and Classification with Score-Based Generative Models

TL;DR

This work addresses the challenge of unifying semantic segmentation, classification, and image generation within a single model. It introduces Symmetrical Flow Matching (SymmFlow), a bidirectional, symmetric flow framework that jointly models forward and reverse transformations between images and semantic representations, while preserving entropy for diverse generation. The approach enables one-pass segmentation and classification conditioned on flexible semantic inputs, and demonstrates state-of-the-art semantic image synthesis with only 25 inference steps, alongside competitive segmentation and promising classification results. Overall, SymmFlow offers a practical, versatile framework that narrows the gap between discriminative and generative modeling in vision tasks, with potential for extensions to depth estimation and broader conditioning modalities.

Abstract

Flow Matching has emerged as a powerful framework for learning continuous transformations between distributions, enabling high-fidelity generative modeling. This work introduces Symmetrical Flow Matching (SymmFlow), a new formulation that unifies semantic segmentation, classification, and image generation within a single model. Using a symmetric learning objective, SymmFlow models forward and reverse transformations jointly, ensuring bi-directional consistency, while preserving sufficient entropy for generative diversity. A new training objective is introduced to explicitly retain semantic information across flows, featuring efficient sampling while preserving semantic structure, allowing for one-step segmentation and classification without iterative refinement. Unlike previous approaches that impose strict one-to-one mapping between masks and images, SymmFlow generalizes to flexible conditioning, supporting both pixel-level and image-level class labels. Experimental results on various benchmarks demonstrate that SymmFlow achieves state-of-the-art performance on semantic image synthesis, obtaining FID scores of 11.9 on CelebAMask-HQ and 7.0 on COCO-Stuff with only 25 inference steps. Additionally, it delivers competitive results on semantic segmentation and shows promising capabilities in classification tasks.

Paper Structure

This paper contains 32 sections, 9 equations, 10 figures, 11 tables.

Figures (10)

  • Figure 1: Symmetrical Flow Matching jointly models semantic segmentation and generation as opposing flows. Noise transitions into an image while a label evolves into noise and vice versa. This symmetry maintains entropy for generation while enforcing semantic consistency. Image Y can represent semantic content of any type, from dense masks to global labels, enabling applications like classification and segmentation.
  • Figure 2: Illustration of the optimal transport between the data distributions X and Y, and the intermediate Gaussian distribution.
  • Figure 3: Visualization of the spiral dataset and samples generated by our model.
  • Figure 4: Non-curated samples generated by the model trained on CelebAMask-HQ (left) and COCO-stuff (right). The top row shows the semantic mask used to condition the model. The bottom row shows the samples after 25 integration steps with the Euler ODE solver.
  • Figure 5: Non-curated segmentation masks generated by the model trained on CelebAMask-HQ (left) and COCO-stuff (right). The top row shows the ground-truth segmentation mask. The middle row shows the image used to condition the model. The bottom row shows the segmentations after 25 integration steps with the Euler ODE solver.
  • ...and 5 more figures