Learning Images Across Scales Using Adversarial Training
Krzysztof Wolski, Adarsh Djeacoumar, Alireza Javanmardi, Hans-Peter Seidel, Christian Theobalt, Guillaume Cordonnier, Karol Myszkowski, George Drettakis, Xingang Pan, Thomas Leimkühler
TL;DR
This work addresses learning a coherent, continuous scale-space representation from unstructured, low-resolution image patches, enabling exploration of content across orders of magnitude in scale. It introduces a multiscale generator based on an alias-free StyleGAN3 augmented with progressively distributed Fourier features, coupled with a scale-consistency loss and a progressive patch-sampling strategy to stabilize training across large scale spans. The methodology supports two modes: multiscale pseudo-reconstruction of a single underlying scale space and multiscale generation across environments, achieving up to 256× zoom with high scale coherence and competitive perceptual quality. The approach yields substantial data compression advantages and enables interactive rendering at around 20 frames per second, offering a new direction for efficient, scalable image representations and synthesis across wide scale ranges.
Abstract
The real world exhibits rich structure and detail across many scales of observation. It is difficult, however, to capture and represent a broad spectrum of scales using ordinary images. We devise a novel paradigm for learning a representation that captures an orders-of-magnitude variety of scales from an unstructured collection of ordinary images. We treat this collection as a distribution of scale-space slices to be learned using adversarial training, and additionally enforce coherency across slices. Our approach relies on a multiscale generator with carefully injected procedural frequency content, which allows to interactively explore the emerging continuous scale space. Training across vastly different scales poses challenges regarding stability, which we tackle using a supervision scheme that involves careful sampling of scales. We show that our generator can be used as a multiscale generative model, and for reconstructions of scale spaces from unstructured patches. Significantly outperforming the state of the art, we demonstrate zoom-in factors of up to 256x at high quality and scale consistency.
