Table of Contents
Fetching ...

Improvements to SDXL in NovelAI Diffusion V3

Juan Ossa, Eren Doğan, Alex Birch, F. Johnson

Abstract

In this technical report, we document the changes we made to SDXL in the process of training NovelAI Diffusion V3, our state of the art anime image generation model.

Improvements to SDXL in NovelAI Diffusion V3

Abstract

In this technical report, we document the changes we made to SDXL in the process of training NovelAI Diffusion V3, our state of the art anime image generation model.
Paper Structure (25 sections, 7 equations, 9 figures)

This paper contains 25 sections, 7 equations, 9 figures.

Figures (9)

  • Figure 1: Noise is added to a sample, until the final training timestep $\sigma_\text{max}$, where Gaussian noise with standard deviation 14.6 is added. This amount of noise does not sufficiently destroy the signal in the image; the lowest frequencies (in particular its average colour) remain discernable.
  • Figure 2: Prompting the model to generate "completely black". A model trained to predict images from infinite-noise (ZTSNR) can comply with the prompt. Whereas if we begin inference from a timestep with finite noise, the model outputs an image with medium brightness, trying to match the mean colour it sees in the starting noise, and consequently generates a non-prompt-relevant sample.
  • Figure 3: Absence of a ZTSNR step introduces spurious high-contrast coarse features, attempting to pull the canvas's mean (latent) colour back to 0, the average value of the Gaussian noise provided at the start of inference. Concretely, this can mean an opposing colour is added to the background, or hair colour and clothing details can disobey the prompt.
  • Figure 4: Intermediate denoising predictions with and without ZTSNR. The ZTSNR regime proposes a relevant average canvas colour at $\sigma\approx\infty$fnote:terminal-sigma, whereas the non-ZTSNR regime understands that in a signal noised up only to $\sigma=56$, the mean colour should still be discernable, and incorrectly concludes that the average colour should be 0, adding white to the canvas in order to achieve this.
  • Figure 5: Schedule with/without ZTSNR.
  • ...and 4 more figures