Table of Contents
Fetching ...

Generating Non-Stationary Textures using Self-Rectification

Yang Zhou, Rongjun Xiao, Dani Lischinski, Daniel Cohen-Or, Hui Huang

TL;DR

The paper tackles non-stationary texture synthesis by enabling users to lazily edit a reference texture into a rough target and then applying a two-pass self-rectification using a pre-trained diffusion model with cross-image KV-injection to enforce global structure while preserving local reference details. The method comprises structure-preserving inversion and fine texture sampling, leveraging KV-injection during both inversion and sampling to transfer layout and texture from the reference and coarse edits from the target. It demonstrates strong qualitative performance against TexExp and GCD Loss, with data augmentation for directional textures and a practical runtime on 512×512 images, highlighting broad applicability to texture editing and even natural-image editing. This approach advances controllable, high-fidelity non-stationary texture synthesis by combining diffusion priors, attention-based feature transfer, and a practical lazy-editing workflow, offering a flexible tool for design and graphics applications.

Abstract

This paper addresses the challenge of example-based non-stationary texture synthesis. We introduce a novel twostep approach wherein users first modify a reference texture using standard image editing tools, yielding an initial rough target for the synthesis. Subsequently, our proposed method, termed "self-rectification", automatically refines this target into a coherent, seamless texture, while faithfully preserving the distinct visual characteristics of the reference exemplar. Our method leverages a pre-trained diffusion network, and uses self-attention mechanisms, to gradually align the synthesized texture with the reference, ensuring the retention of the structures in the provided target. Through experimental validation, our approach exhibits exceptional proficiency in handling non-stationary textures, demonstrating significant advancements in texture synthesis when compared to existing state-of-the-art techniques. Code is available at https://github.com/xiaorongjun000/Self-Rectification

Generating Non-Stationary Textures using Self-Rectification

TL;DR

The paper tackles non-stationary texture synthesis by enabling users to lazily edit a reference texture into a rough target and then applying a two-pass self-rectification using a pre-trained diffusion model with cross-image KV-injection to enforce global structure while preserving local reference details. The method comprises structure-preserving inversion and fine texture sampling, leveraging KV-injection during both inversion and sampling to transfer layout and texture from the reference and coarse edits from the target. It demonstrates strong qualitative performance against TexExp and GCD Loss, with data augmentation for directional textures and a practical runtime on 512×512 images, highlighting broad applicability to texture editing and even natural-image editing. This approach advances controllable, high-fidelity non-stationary texture synthesis by combining diffusion priors, attention-based feature transfer, and a practical lazy-editing workflow, offering a flexible tool for design and graphics applications.

Abstract

This paper addresses the challenge of example-based non-stationary texture synthesis. We introduce a novel twostep approach wherein users first modify a reference texture using standard image editing tools, yielding an initial rough target for the synthesis. Subsequently, our proposed method, termed "self-rectification", automatically refines this target into a coherent, seamless texture, while faithfully preserving the distinct visual characteristics of the reference exemplar. Our method leverages a pre-trained diffusion network, and uses self-attention mechanisms, to gradually align the synthesized texture with the reference, ensuring the retention of the structures in the provided target. Through experimental validation, our approach exhibits exceptional proficiency in handling non-stationary textures, demonstrating significant advancements in texture synthesis when compared to existing state-of-the-art techniques. Code is available at https://github.com/xiaorongjun000/Self-Rectification
Paper Structure (22 sections, 7 equations, 12 figures, 3 algorithms)

This paper contains 22 sections, 7 equations, 12 figures, 3 algorithms.

Figures (12)

  • Figure 1: Our method takes as input a reference texture (left), and a crude target texture provided by the user (middle column), which may lack coherence and completeness. Self-rectification is used to transform the target into a visually coherent texture (right) that complies with the structure of the crude target, while exhibiting the visual characteristics of the reference texture.
  • Figure 2: Framework overview. Given a reference texture $I^\textit{R}$, we allow the user to quickly build a target image $I^\textit{tar}$ in a lazy-editing manner. A coarse-to-fine synthesis is performed by running self-rectification twice. The coarse stage synthesizes a coarse yet complete overall structure, and the fine stage refines its output $I_\textit{coarse}^*$ with finer and more accurate details, producing the final result $I^*$.
  • Figure 3: Our self-rectification synthesizes an output texture $I^*$ via structure-preserving inversion from a rough target image $I^\textit{tar}$ and fine texture sampling using the reference $I^\textit{R}$. Both processes require the injection of self-attention features ($KV$) from the DDIM inversion of a corresponding reference. More specifically, for structure-preserving inversion, the reference is the target image itself, denoted as $I^\textit{IR}$. For fine texture sampling, the input exemplar $I^\textit{R}$ is used to inject features that help to synthesize a plausible output with fine texture details.
  • Figure 4: Visualization of the intermediate latent codes in the inversion. For the standard DDIM inversion (top), the U-Net predicts noise to diffuse the distinctive structures so as to transform the input into Gaussian noise. In contrast, our structure-preserving inversion (bottom) reserves the distinctive patterns from user edits along the inversion process. See texts in \ref{['subsec:refinement']} for more details.
  • Figure 5: Visualization of the intermediate latent codes in the fine texture sampling. Here, for the first 20 steps (from $t=50$ to $30$), we perform the standard DDIM sampling to reconstruct the target layout. Next, we perform $KV$-injection in the remaining sampling steps ($t=30$ to 0), to synthesize fine textures for the output image. The rightmost shows the result produced by simply performing standard DDIM sampling for all steps, i.e., $\textup{S}=50$. No additional structure is synthesized to complete the user edits.
  • ...and 7 more figures