Table of Contents
Fetching ...

Discrete Distribution Networks

Lei Yang

TL;DR

Discrete Distribution Networks (DDN) address the challenge of modeling complex data distributions by generating multiple discrete samples per layer and stacking $L$ layers to form a $K^L$-sized latent space. A novel Split-and-Prune optimization mitigates dead nodes and density shift, guiding the hierarchical discrete outputs toward the ground truth and enabling zero-shot conditional generation across both pixel and non-pixel domains using black-box discriminators without gradient information, with a data-compression-capable latent of $L \times \log_2 K$ bits. The approach supports conditioning via Guided Samplers (e.g., CLIP, classifiers) and can perform image-to-image tasks, while offering flexible training paradigms (Single Shot vs Recurrence) and techniques like Chain Dropout and Learning Residual to improve performance. Empirical results on CIFAR-10, FFHQ, and CelebA-HQ demonstrate competitive generation quality and compelling zero-shot conditioning capabilities, suggesting a novel direction for discrete, hierarchical generative modeling with compact, semantically meaningful latents.

Abstract

We introduce a novel generative model, the Discrete Distribution Networks (DDN), that approximates data distribution using hierarchical discrete distributions. We posit that since the features within a network inherently capture distributional information, enabling the network to generate multiple samples simultaneously, rather than a single output, may offer an effective way to represent distributions. Therefore, DDN fits the target distribution, including continuous ones, by generating multiple discrete sample points. To capture finer details of the target data, DDN selects the output that is closest to the Ground Truth (GT) from the coarse results generated in the first layer. This selected output is then fed back into the network as a condition for the second layer, thereby generating new outputs more similar to the GT. As the number of DDN layers increases, the representational space of the outputs expands exponentially, and the generated samples become increasingly similar to the GT. This hierarchical output pattern of discrete distributions endows DDN with unique properties: more general zero-shot conditional generation and 1D latent representation. We demonstrate the efficacy of DDN and its intriguing properties through experiments on CIFAR-10 and FFHQ. The code is available at https://discrete-distribution-networks.github.io/

Discrete Distribution Networks

TL;DR

Discrete Distribution Networks (DDN) address the challenge of modeling complex data distributions by generating multiple discrete samples per layer and stacking layers to form a -sized latent space. A novel Split-and-Prune optimization mitigates dead nodes and density shift, guiding the hierarchical discrete outputs toward the ground truth and enabling zero-shot conditional generation across both pixel and non-pixel domains using black-box discriminators without gradient information, with a data-compression-capable latent of bits. The approach supports conditioning via Guided Samplers (e.g., CLIP, classifiers) and can perform image-to-image tasks, while offering flexible training paradigms (Single Shot vs Recurrence) and techniques like Chain Dropout and Learning Residual to improve performance. Empirical results on CIFAR-10, FFHQ, and CelebA-HQ demonstrate competitive generation quality and compelling zero-shot conditioning capabilities, suggesting a novel direction for discrete, hierarchical generative modeling with compact, semantically meaningful latents.

Abstract

We introduce a novel generative model, the Discrete Distribution Networks (DDN), that approximates data distribution using hierarchical discrete distributions. We posit that since the features within a network inherently capture distributional information, enabling the network to generate multiple samples simultaneously, rather than a single output, may offer an effective way to represent distributions. Therefore, DDN fits the target distribution, including continuous ones, by generating multiple discrete sample points. To capture finer details of the target data, DDN selects the output that is closest to the Ground Truth (GT) from the coarse results generated in the first layer. This selected output is then fed back into the network as a condition for the second layer, thereby generating new outputs more similar to the GT. As the number of DDN layers increases, the representational space of the outputs expands exponentially, and the generated samples become increasingly similar to the GT. This hierarchical output pattern of discrete distributions endows DDN with unique properties: more general zero-shot conditional generation and 1D latent representation. We demonstrate the efficacy of DDN and its intriguing properties through experiments on CIFAR-10 and FFHQ. The code is available at https://discrete-distribution-networks.github.io/
Paper Structure (16 sections, 5 equations, 21 figures, 3 tables, 1 algorithm)

This paper contains 16 sections, 5 equations, 21 figures, 3 tables, 1 algorithm.

Figures (21)

  • Figure 1: (a) Illustrates the process of image reconstruction and latent acquisition in DDN. Each layer of DDN outputs $K$ distinct images to approximate the distribution $P(X)$. The sampler then selects the image most similar to the target from these and feeds it into the next DDN layer. As the number of layers increases, the generated images become increasingly similar to the target. For generation tasks, the sampler is simply replaced with a random choice operation. (b) Depicts the tree-structured representation space of DDN's latent variables. Each sample can be mapped to a leaf node on this tree.
  • Figure 2: DDN enables more general zero-shot conditional generation. DDN supports zero-shot conditional generation across non-pixel domains, and notably, without relying on gradient, such as text-to-image generation using a black-box CLIP model radford2021learning. Images enclosed in yellow borders serve as the ground truth. The abbreviations in the table header correspond to their respective tasks as follows: 'SR' stands for Super-Resolution, with the following digit indicating the resolution of the condition. 'ST' denotes Style Transfer, which computes Perceptual Losses with the condition according to johnson2016perceptual.
  • Figure 3: Schematic of Discrete Distribution Networks (DDN). (a) The data flow during the training phase of DDN is shown at the top. As the network depth increases, the generated images become increasingly similar to the training images. Within each Discrete Distribution Layer (DDL), $K$ samples are generated, and the one closest to the training sample is selected as the generated image for loss computation. These $K$ output nodes are optimized using Adam with the Split-and-Prune method. The right two figures shown the two model paradigms supported by DDN. (b) Single Shot Generator Paradigm: Each neural network layer and DDL has independent weights. (c) Recurrence Iteration Paradigm: All neural network layers and DDLs share weights. For inference, replacing the Guided Sampler in the DDL with a random choice enables the generation of new images.
  • Figure 4: Illustration of the principle behind the Split-and-Prune operation. For example in (a), the light blue bell-shaped curve represents a one-dimensional target distribution. The 5 "↑" under the x-axis are the initial values from a uniform distribution of 5 output nodes, which divide the entire space into 5 parts using midpoints between adjacent nodes as boundaries (i.e., vertical gray dashed lines). Each part corresponds to the range represented by this output node on the continuous space $x$. Below each node are three values: $P$ stands for the relative frequency of the ground truth falling within this node's range during training; $Q$ refers to the probability mass of this sample (node) in the discrete distribution output by the model during the generation phase, which is generally equal for each sample, i.e., $1/K$. The bottom-most value denotes the difference between $P$ and $Q$. Colorful horizontal line segments represent the average probability density of $P$, $Q$ within corresponding intervals. In (b), the Split operation selects the node with the highest $P$ (circled in red). In (c), the Prune operation selects the node with the smallest $P$ (circled in red). In (d), through the combined effects of loss and Split-and-Prune operations, the distribution of output nodes moves towards final optimization. From the observed results, the KL divergence ($KL(P||Q)$) consistently decreases as the operation progresses, and the yellow line increasingly approximates the light blue target distribution.
  • Figure 5: Random samples from DDN. Figures (d) and (e) showcase images that are conditionally generated by conditional DDN, with each row of images representing a distinct category.
  • ...and 16 more figures