Table of Contents
Fetching ...

Boomerang: Local sampling on image manifolds using diffusion models

Lorenzo Luzi, Paul M Mayer, Josue Casco-Rodriguez, Ali Siahkoohi, Richard G. Baraniuk

TL;DR

Boomerang addresses the need for local sampling on image manifolds by introducing a locality-controllable procedure that partially forward-diffuses an input image up to $t_{Boom}$ steps and then performs a full reverse diffusion to land on nearby manifold points $\mathbf{x}_0'$. The approach is compatible with any pretrained diffusion backbone and requires no retraining or architecture changes, with $t_{Boom}$ acting as the sole control for locality versus global sampling ($t_{Boom}=T$ yields global samples). The authors demonstrate three practical applications: privacy-preserving anonymization, data augmentation with improved generalization over state-of-the-art synthetic augmentation, and perceptual resolution enhancement (PRE) that emphasizes perceptual quality. This work has practical impact by enabling targeted, efficient manipulation of image content and quality without retraining, using readily available diffusion models.

Abstract

The inference stage of diffusion models can be seen as running a reverse-time diffusion stochastic differential equation, where samples from a Gaussian latent distribution are transformed into samples from a target distribution that usually reside on a low-dimensional manifold, e.g., an image manifold. The intermediate values between the initial latent space and the image manifold can be interpreted as noisy images, with the amount of noise determined by the forward diffusion process noise schedule. We utilize this interpretation to present Boomerang, an approach for local sampling of image manifolds. As implied by its name, Boomerang local sampling involves adding noise to an input image, moving it closer to the latent space, and then mapping it back to the image manifold through a partial reverse diffusion process. Thus, Boomerang generates images on the manifold that are ``similar,'' but nonidentical, to the original input image. We can control the proximity of the generated images to the original by adjusting the amount of noise added. Furthermore, due to the stochastic nature of the reverse diffusion process in Boomerang, the generated images display a certain degree of stochasticity, allowing us to obtain local samples from the manifold without encountering any duplicates. Boomerang offers the flexibility to work seamlessly with any pretrained diffusion model, such as Stable Diffusion, without necessitating any adjustments to the reverse diffusion process. We present three applications for Boomerang. First, we provide a framework for constructing privacy-preserving datasets having controllable degrees of anonymity. Second, we show that using Boomerang for data augmentation increases generalization performance and outperforms state-of-the-art synthetic data augmentation. Lastly, we introduce a perceptual image enhancement framework, which enables resolution enhancement.

Boomerang: Local sampling on image manifolds using diffusion models

TL;DR

Boomerang addresses the need for local sampling on image manifolds by introducing a locality-controllable procedure that partially forward-diffuses an input image up to steps and then performs a full reverse diffusion to land on nearby manifold points . The approach is compatible with any pretrained diffusion backbone and requires no retraining or architecture changes, with acting as the sole control for locality versus global sampling ( yields global samples). The authors demonstrate three practical applications: privacy-preserving anonymization, data augmentation with improved generalization over state-of-the-art synthetic augmentation, and perceptual resolution enhancement (PRE) that emphasizes perceptual quality. This work has practical impact by enabling targeted, efficient manipulation of image content and quality without retraining, using readily available diffusion models.

Abstract

The inference stage of diffusion models can be seen as running a reverse-time diffusion stochastic differential equation, where samples from a Gaussian latent distribution are transformed into samples from a target distribution that usually reside on a low-dimensional manifold, e.g., an image manifold. The intermediate values between the initial latent space and the image manifold can be interpreted as noisy images, with the amount of noise determined by the forward diffusion process noise schedule. We utilize this interpretation to present Boomerang, an approach for local sampling of image manifolds. As implied by its name, Boomerang local sampling involves adding noise to an input image, moving it closer to the latent space, and then mapping it back to the image manifold through a partial reverse diffusion process. Thus, Boomerang generates images on the manifold that are ``similar,'' but nonidentical, to the original input image. We can control the proximity of the generated images to the original by adjusting the amount of noise added. Furthermore, due to the stochastic nature of the reverse diffusion process in Boomerang, the generated images display a certain degree of stochasticity, allowing us to obtain local samples from the manifold without encountering any duplicates. Boomerang offers the flexibility to work seamlessly with any pretrained diffusion model, such as Stable Diffusion, without necessitating any adjustments to the reverse diffusion process. We present three applications for Boomerang. First, we provide a framework for constructing privacy-preserving datasets having controllable degrees of anonymity. Second, we show that using Boomerang for data augmentation increases generalization performance and outperforms state-of-the-art synthetic data augmentation. Lastly, we introduce a perceptual image enhancement framework, which enables resolution enhancement.
Paper Structure (17 sections, 7 equations, 16 figures, 3 tables, 1 algorithm)

This paper contains 17 sections, 7 equations, 16 figures, 3 tables, 1 algorithm.

Figures (16)

  • Figure 1: An example using Boomerang via Stable Diffusion Rombach_2022_CVPR. Starting from an original image ${\bm{x}}_0 \sim p({\bm{x}}_0)$, we add varying levels of noise to the latent variables according to the noise schedule of the forward diffusion process. Boomerang maps the noisy latent variables back to the image manifold by running the reverse diffusion process starting from the reverse step associated with the added noise out of $T=1000$. The resulting images are local samples from the image manifold, where the closeness is determined by the amount of added noise. Note how, as $t$ approaches to $T$, the content of Boomerang-generated images strays further away from the starting image. While Boomerang here is applied to the Stable Diffusion model, it is applicable to other types of diffusion models, e.g., denoising diffusion models NEURIPS2020_4c5bcfec. Additional images are provided in \ref{['appendix-1']}. A Boomerang Colab demo is available at https://colab.research.google.com/drive/1PV5Z6b14HYZNx1lHCaEVhId-Y4baKXwt.
  • Figure 2: Using Boomerang on CIFAR-10 to change the visual features of images. These images were created with FastDPM kong2021fast using $t_\text{Boom} / T = \frac{40}{100} = 40\%$.
  • Figure 3: Using Boomerang on ImageNet-200 to change the visual features of images. These images were created with Patched Diffusion luhman2022improving using $t_\text{Boom} / T = 75/250 = 30\%$. The FID values for these images have been plotted in \ref{['fig:FIDfig4']} in the Appendix.
  • Figure 4: Nine randomly selected LFWPeople samples anonymized via Boomerang Stable Diffusion with the same random seed and with the same prompt: "picture of a person".
  • Figure 5: Facial recognition embedding distances between Boomeranged images and the original images as a function of $t_\text{Boom}$. We use the VGG-Face and Facenet models deep_face_recognition_survey to calculate the embeddings; both models have a default minimum distance threshold of 0.4 to declare that two images are of different people. The largest standard error is 0.0017 for embedding distance and 0.21% for the number of images over the threshold. \ref{['fig:lfw_distribution']} contains the full distribution of VGG-Face embedding distances.
  • ...and 11 more figures