Consistent Story Generation: Unlocking the Potential of Zigzag Sampling
Mingxiao Li, Mang Ning, Marie-Francine Moens
TL;DR
This work tackles the problem of maintaining consistent subject identity across multi-image visual storytelling with diffusion models. It introduces Asymmetry Zigzag Sampling (AZS), which combines Zig Visual Sharing (ZVS) and Asymmetric Prompt Zigzag Inference (APZI) to inject subject information into latent representations while preserving textual alignment, without fine-tuning. The method operates in three sub-steps (zig, zag, generation) with asymmetric guidance to balance identity fidelity and prompt fidelity, and it demonstrates improved performance across SDXL and FLUX backbones, including a human-preference edge. The approach offers a scalable, training-free path to coherent long-form visual narratives, at the cost of higher inference time, and is validated through extensive quantitative, qualitative, and user studies.
Abstract
Text-to-image generation models have made significant progress in producing high-quality images from textual descriptions, yet they continue to struggle with maintaining subject consistency across multiple images, a fundamental requirement for visual storytelling. Existing methods attempt to address this by either fine-tuning models on large-scale story visualization datasets, which is resource-intensive, or by using training-free techniques that share information across generations, which still yield limited success. In this paper, we introduce a novel training-free sampling strategy called Zigzag Sampling with Asymmetric Prompts and Visual Sharing to enhance subject consistency in visual story generation. Our approach proposes a zigzag sampling mechanism that alternates between asymmetric prompting to retain subject characteristics, while a visual sharing module transfers visual cues across generated images to %further enforce consistency. Experimental results, based on both quantitative metrics and qualitative evaluations, demonstrate that our method significantly outperforms previous approaches in generating coherent and consistent visual stories. The code is available at https://github.com/Mingxiao-Li/Asymmetry-Zigzag-StoryDiffusion.
