SD-$π$XL: Generating Low-Resolution Quantized Imagery via Score Distillation
Alexandre Binninger, Olga Sorkine-Hornung
TL;DR
SD-$\pi$XL addresses the challenge of generating low-resolution, color-quantized imagery under hard palette and resolution constraints by combining a differentiable image generator with score distillation sampling from pretrained diffusion models. It parameterizes the output as an $H \times W \times n$ logits tensor and uses a Gumbel-Softmax mechanism to sample discrete palette elements, yielding crisp pixel art while maintaining differentiability. The approach supports text prompts or optional spatial conditioning via ControlNet (edges and depth), and strictly enforces palette adherence while enabling generation at arbitrary resolutions through a flexible loss that combines semantic guidance with FFT-based smoothness. Empirically, SD-$\pi$XL achieves state-of-the-art performance in quantized image generation and demonstrates practical fabrication applications, such as embroidery, fuse beads, and interlocking-brick designs; limitations include optimization speed and per-pixel independence, with future work focusing on faster convergence, image-only conditioning, and improved frame-to-frame consistency for animation.
Abstract
Low-resolution quantized imagery, such as pixel art, is seeing a revival in modern applications ranging from video game graphics to digital design and fabrication, where creativity is often bound by a limited palette of elemental units. Despite their growing popularity, the automated generation of quantized images from raw inputs remains a significant challenge, often necessitating intensive manual input. We introduce SD-$π$XL, an approach for producing quantized images that employs score distillation sampling in conjunction with a differentiable image generator. Our method enables users to input a prompt and optionally an image for spatial conditioning, set any desired output size $H \times W$, and choose a palette of $n$ colors or elements. Each color corresponds to a distinct class for our generator, which operates on an $H \times W \times n$ tensor. We adopt a softmax approach, computing a convex sum of elements, thus rendering the process differentiable and amenable to backpropagation. We show that employing Gumbel-softmax reparameterization allows for crisp pixel art effects. Unique to our method is the ability to transform input images into low-resolution, quantized versions while retaining their key semantic features. Our experiments validate SD-$π$XL's performance in creating visually pleasing and faithful representations, consistently outperforming the current state-of-the-art. Furthermore, we showcase SD-$π$XL's practical utility in fabrication through its applications in interlocking brick mosaic, beading and embroidery design.
