Symmetrical Flow Matching: Unified Image Generation, Segmentation, and Classification with Score-Based Generative Models
Francisco Caetano, Christiaan Viviers, Peter H. N. De With, Fons van der Sommen
TL;DR
This work addresses the challenge of unifying semantic segmentation, classification, and image generation within a single model. It introduces Symmetrical Flow Matching (SymmFlow), a bidirectional, symmetric flow framework that jointly models forward and reverse transformations between images and semantic representations, while preserving entropy for diverse generation. The approach enables one-pass segmentation and classification conditioned on flexible semantic inputs, and demonstrates state-of-the-art semantic image synthesis with only 25 inference steps, alongside competitive segmentation and promising classification results. Overall, SymmFlow offers a practical, versatile framework that narrows the gap between discriminative and generative modeling in vision tasks, with potential for extensions to depth estimation and broader conditioning modalities.
Abstract
Flow Matching has emerged as a powerful framework for learning continuous transformations between distributions, enabling high-fidelity generative modeling. This work introduces Symmetrical Flow Matching (SymmFlow), a new formulation that unifies semantic segmentation, classification, and image generation within a single model. Using a symmetric learning objective, SymmFlow models forward and reverse transformations jointly, ensuring bi-directional consistency, while preserving sufficient entropy for generative diversity. A new training objective is introduced to explicitly retain semantic information across flows, featuring efficient sampling while preserving semantic structure, allowing for one-step segmentation and classification without iterative refinement. Unlike previous approaches that impose strict one-to-one mapping between masks and images, SymmFlow generalizes to flexible conditioning, supporting both pixel-level and image-level class labels. Experimental results on various benchmarks demonstrate that SymmFlow achieves state-of-the-art performance on semantic image synthesis, obtaining FID scores of 11.9 on CelebAMask-HQ and 7.0 on COCO-Stuff with only 25 inference steps. Additionally, it delivers competitive results on semantic segmentation and shows promising capabilities in classification tasks.
