Table of Contents
Fetching ...

Tiled Diffusion

Or Madar, Ohad Fried

TL;DR

Tiled Diffusion addresses the limitation of existing tiling approaches by enabling cohesive, cross-image tile generation across diverse domains. It introduces two latent-space constraints— tiling and similarity—applied at every diffusion step to ensure global structure and seam coherence within a unified framework, supporting self-, one-to-one, and many-to-many tiling. The method demonstrates superior tileability and image quality across seamless tiling, texture generation, and 360° synthesis, backed by a dedicated tiling score and quantitative metrics. This approach enables scalable, automated tiling of multiple images, with broad implications for texture creation, game assets, and digital art, while noting cross-axis limitations that motivate future refinements.

Abstract

Image tiling -- the seamless connection of disparate images to create a coherent visual field -- is crucial for applications such as texture creation, video game asset development, and digital art. Traditionally, tiles have been constructed manually, a method that poses significant limitations in scalability and flexibility. Recent research has attempted to automate this process using generative models. However, current approaches primarily focus on tiling textures and manipulating models for single-image generation, without inherently supporting the creation of multiple interconnected tiles across diverse domains. This paper presents Tiled Diffusion, a novel approach that extends the capabilities of diffusion models to accommodate the generation of cohesive tiling patterns across various domains of image synthesis that require tiling. Our method supports a wide range of tiling scenarios, from self-tiling to complex many-to-many connections, enabling seamless integration of multiple images. Tiled Diffusion automates the tiling process, eliminating the need for manual intervention and enhancing creative possibilities in various applications, such as seamlessly tiling of existing images, tiled texture creation, and 360$^\circ$ synthesis.

Tiled Diffusion

TL;DR

Tiled Diffusion addresses the limitation of existing tiling approaches by enabling cohesive, cross-image tile generation across diverse domains. It introduces two latent-space constraints— tiling and similarity—applied at every diffusion step to ensure global structure and seam coherence within a unified framework, supporting self-, one-to-one, and many-to-many tiling. The method demonstrates superior tileability and image quality across seamless tiling, texture generation, and 360° synthesis, backed by a dedicated tiling score and quantitative metrics. This approach enables scalable, automated tiling of multiple images, with broad implications for texture creation, game assets, and digital art, while noting cross-axis limitations that motivate future refinements.

Abstract

Image tiling -- the seamless connection of disparate images to create a coherent visual field -- is crucial for applications such as texture creation, video game asset development, and digital art. Traditionally, tiles have been constructed manually, a method that poses significant limitations in scalability and flexibility. Recent research has attempted to automate this process using generative models. However, current approaches primarily focus on tiling textures and manipulating models for single-image generation, without inherently supporting the creation of multiple interconnected tiles across diverse domains. This paper presents Tiled Diffusion, a novel approach that extends the capabilities of diffusion models to accommodate the generation of cohesive tiling patterns across various domains of image synthesis that require tiling. Our method supports a wide range of tiling scenarios, from self-tiling to complex many-to-many connections, enabling seamless integration of multiple images. Tiled Diffusion automates the tiling process, eliminating the need for manual intervention and enhancing creative possibilities in various applications, such as seamlessly tiling of existing images, tiled texture creation, and 360 synthesis.

Paper Structure

This paper contains 27 sections, 4 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Tiled Diffusion. Our method generates seamlessly tileable images using diffusion models. Left: input constraints and their results. Matching color patterns on edges indicate they should tile seamlessly. (We use two colors to convey the direction.) Color strip above the result shows which constraint was used for each area. Right: various applications of our method: tiled texture synthesis, 360° synthesis, and seamlessly tiling existing images. Each texture includes 4 tiles (2x2); the 360° examples wrap on the horizontal axis; the seamlessly tiling example shows original images (left) and their tileable versions (right), both in 1x2 arrangements.
  • Figure 2: Tiling scenarios. Left: Self-tiling, where the image connects only to itself vertically and horizontally. Middle: One-to-one tiling on the X-axis, with each image connecting only to the other image. Right: Many-to-many tiling on the X-axis, where right sides of both images can connect to left sides of both images. Lower: constraints ($C_j$'s) for each tiling scenario.
  • Figure 3: Method overview and results. Our method uses two key constraints in latent space: tiling constraints for global consistency and similarity constraints for seamless connections in complex scenarios. Left: Input constraints for many-to-many tiling on the X-axis between two images. Second column: Tiling constraint applied to the left side of the first image, using round-robin context selection. Third column: Similarity constraint propagated from the second image to the first. Fourth column: Output images after decoding and cropping. Right: Two example arrangements demonstrating many-to-many tiling scenarios.
  • Figure 4: Illustration of the impact of different context window sizes ($w$) on tiling results. The figure displays panoramic views created by horizontally stacking the results for large (top), medium (middle), and small (bottom) $w$. We observe that with a large $w$, the transitions between tiles are smoother and more gradual, resulting in a more coherent overall image with less transition variations. As $w$ decreases, we see sharper transitions variations, but potentially at the cost of global coherence.
  • Figure 5: Quantitative analysis of our metrics as we increase the number of sides in $A_j$ and $B_j$ for each constraint $C_j$ in many-to-many tiling scenarios ($n=|A_j|=|B_j|$). The graphs show relatively constant values across different side counts, demonstrating our method's ability to scale effectively. For $n = 1$ (one-to-one) we also include the STI method (red dot).
  • ...and 3 more figures