SIGMA-GEN: Structure and Identity Guided Multi-subject Assembly for Image Generation
Oindrila Saha, Vojtech Krs, Radomir Mech, Subhransu Maji, Kevin Blackburn-Matzen, Matheus Gadelha
TL;DR
SIGMA-Gen addresses the need for simultaneous identity and structure control in multi-subject image generation by introducing a single diffusion-based model that leverages both subject identity cues and a unified spatial control representation. It introduces SIGMA-Set27K, a large synthetic dataset with multiple identities per image and per-subject annotations, enabling robust training of the model. The approach combines a two-part spatial conditioning (routing and depth) with per-subject identity conditioning via identity crops, achieving state-of-the-art performance in identity preservation, image fidelity, and generation speed, especially in scenes with five or more subjects. The framework supports versatile applications such as subject insertion and reposing, and demonstrates strong generalization across coarse to fine control modalities, marking a significant step toward practical, controllable multi-subject generation.
Abstract
We present SIGMA-GEN, a unified framework for multi-identity preserving image generation. Unlike prior approaches, SIGMA-GEN is the first to enable single-pass multi-subject identity-preserved generation guided by both structural and spatial constraints. A key strength of our method is its ability to support user guidance at various levels of precision -- from coarse 2D or 3D boxes to pixel-level segmentations and depth -- with a single model. To enable this, we introduce SIGMA-SET27K, a novel synthetic dataset that provides identity, structure, and spatial information for over 100k unique subjects across 27k images. Through extensive evaluation we demonstrate that SIGMA-GEN achieves state-of-the-art performance in identity preservation, image generation quality, and speed. Code and visualizations at https://oindrilasaha.github.io/SIGMA-Gen/
