Table of Contents
Fetching ...

LATENTPATCH: A Non-Parametric Approach for Face Generation and Editing

Benjamin Samuth, Julien Rabin, David Tschumperlé, Frédéric Jurie

TL;DR

LatentPatch addresses few-shot face generation by avoiding fine-tuning large generative models and instead exploiting a non-parametric, patch-based sampling in a latent space derived from a pre-trained auto-encoder. The method constructs a universal latent representation, compresses it with PCA, and synthesizes new latent codes by sequentially sampling latent patches from the source distribution across multiple scales, without any model training. It supports conditional generation and attribute-guided editing, offering near photo-realistic results with a tiny parameter budget. Experiments on face datasets show competitive quality and flexibility, highlighting practical applicability when data and compute are scarce.

Abstract

This paper presents LatentPatch, a new method for generating realistic images from a small dataset of only a few images. We use a lightweight model with only a few thousand parameters. Unlike traditional few-shot generation methods that finetune pre-trained large-scale generative models, our approach is computed directly on the latent distribution by sequential feature matching, and is explainable by design. Avoiding large models based on transformers, recursive networks, or self-attention, which are not suitable for small datasets, our method is inspired by non-parametric texture synthesis and style transfer models, and ensures that generated image features are sampled from the source distribution. We extend previous single-image models to work with a few images and demonstrate that our method can generate realistic images, as well as enable conditional sampling and image editing. We conduct experiments on face datasets and show that our simplistic model is effective and versatile.

LATENTPATCH: A Non-Parametric Approach for Face Generation and Editing

TL;DR

LatentPatch addresses few-shot face generation by avoiding fine-tuning large generative models and instead exploiting a non-parametric, patch-based sampling in a latent space derived from a pre-trained auto-encoder. The method constructs a universal latent representation, compresses it with PCA, and synthesizes new latent codes by sequentially sampling latent patches from the source distribution across multiple scales, without any model training. It supports conditional generation and attribute-guided editing, offering near photo-realistic results with a tiny parameter budget. Experiments on face datasets show competitive quality and flexibility, highlighting practical applicability when data and compute are scarce.

Abstract

This paper presents LatentPatch, a new method for generating realistic images from a small dataset of only a few images. We use a lightweight model with only a few thousand parameters. Unlike traditional few-shot generation methods that finetune pre-trained large-scale generative models, our approach is computed directly on the latent distribution by sequential feature matching, and is explainable by design. Avoiding large models based on transformers, recursive networks, or self-attention, which are not suitable for small datasets, our method is inspired by non-parametric texture synthesis and style transfer models, and ensures that generated image features are sampled from the source distribution. We extend previous single-image models to work with a few images and demonstrate that our method can generate realistic images, as well as enable conditional sampling and image editing. We conduct experiments on face datasets and show that our simplistic model is effective and versatile.
Paper Structure (4 sections, 4 figures, 1 table)

This paper contains 4 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: The proposed patch-based approach, coined ‘‘ LatentPatch’’, can generate images like (c) using only the 16 source images shown in the first row, without any learning. It also enables easy implementation of variants such as (d) reference-based generation, (e) editing, and (f) attribute-constrained generation (on alternative data not shown here). Images (a) and (b) were generated using random patches from the source images, with the origin of each patch indicated by its color. Additional results can be found on the project page samuth2023webpage.
  • Figure 2: Illustration of the proposed face generation framework based on a pre-trained auto-encoder. See the text for more details.
  • Figure 3: Generation a $10\times10$ image using patches of size $4\times4$ and $2\times 2$ strides, following the same process at each scale. The query patches are masked to exclude the not-yet-generated pixels.
  • Figure 4: Comparison of generated images with different patch sizes ($\omega$), strides ($w$) and data size ($B$). Coherence of the generated images improves with increasing patch size and stride, but larger example regions also result in reduced diversity. The first column shows the patch index, encoded in normalized hue values, for $B=16$.