Table of Contents
Fetching ...

Fast & Efficient Normalizing Flows and Applications of Image Generative Models

Sandeep Nagar

TL;DR

This work tackles two intertwined goals: making normalizing flows more efficient and leveraging image-generative models for real-world computer vision tasks. It introduces CInC Flow and Inverse-Flow to enable fast, parallelized inversion of convolutions and scalable training, plus Affine-StableSR to combine diffusion priors with lightweight NF-inspired encoding for super-resolution. The applications span automated seed-quality assessment, privacy-preserving anonymization in driving datasets, unsupervised geological mapping with stacked autoencoders, diffusion-based art restoration, and robust traffic-sign detection under missing-sign scenarios. Collectively, the contributions deliver both theoretical advances in flow-based models and practical systems for efficiency, privacy, and real-world CV tasks. The work demonstrates substantial gains in sampling speed, parameter efficiency, and applicability to diverse domains, signaling a meaningful step toward deploying principled generative models at scale.

Abstract

This thesis presents novel contributions in two primary areas: advancing the efficiency of generative models, particularly normalizing flows, and applying generative models to solve real-world computer vision challenges. The first part introduce significant improvements to normalizing flow architectures through six key innovations: 1) Development of invertible 3x3 Convolution layers with mathematically proven necessary and sufficient conditions for invertibility, (2) introduction of a more efficient Quad-coupling layer, 3) Design of a fast and efficient parallel inversion algorithm for kxk convolutional layers, 4) Fast & efficient backpropagation algorithm for inverse of convolution, 5) Using inverse of convolution, in Inverse-Flow, for the forward pass and training it using proposed backpropagation algorithm, and 6) Affine-StableSR, a compact and efficient super-resolution model that leverages pre-trained weights and Normalizing Flow layers to reduce parameter count while maintaining performance. The second part: 1) An automated quality assessment system for agricultural produce using Conditional GANs to address class imbalance, data scarcity and annotation challenges, achieving good accuracy in seed purity testing; 2) An unsupervised geological mapping framework utilizing stacked autoencoders for dimensionality reduction, showing improved feature extraction compared to conventional methods; 3) We proposed a privacy preserving method for autonomous driving datasets using on face detection and image inpainting; 4) Utilizing Stable Diffusion based image inpainting for replacing the detected face and license plate to advancing privacy-preserving techniques and ethical considerations in the field.; and 5) An adapted diffusion model for art restoration that effectively handles multiple types of degradation through unified fine-tuning.

Fast & Efficient Normalizing Flows and Applications of Image Generative Models

TL;DR

This work tackles two intertwined goals: making normalizing flows more efficient and leveraging image-generative models for real-world computer vision tasks. It introduces CInC Flow and Inverse-Flow to enable fast, parallelized inversion of convolutions and scalable training, plus Affine-StableSR to combine diffusion priors with lightweight NF-inspired encoding for super-resolution. The applications span automated seed-quality assessment, privacy-preserving anonymization in driving datasets, unsupervised geological mapping with stacked autoencoders, diffusion-based art restoration, and robust traffic-sign detection under missing-sign scenarios. Collectively, the contributions deliver both theoretical advances in flow-based models and practical systems for efficiency, privacy, and real-world CV tasks. The work demonstrates substantial gains in sampling speed, parameter efficiency, and applicability to diverse domains, signaling a meaningful step toward deploying principled generative models at scale.

Abstract

This thesis presents novel contributions in two primary areas: advancing the efficiency of generative models, particularly normalizing flows, and applying generative models to solve real-world computer vision challenges. The first part introduce significant improvements to normalizing flow architectures through six key innovations: 1) Development of invertible 3x3 Convolution layers with mathematically proven necessary and sufficient conditions for invertibility, (2) introduction of a more efficient Quad-coupling layer, 3) Design of a fast and efficient parallel inversion algorithm for kxk convolutional layers, 4) Fast & efficient backpropagation algorithm for inverse of convolution, 5) Using inverse of convolution, in Inverse-Flow, for the forward pass and training it using proposed backpropagation algorithm, and 6) Affine-StableSR, a compact and efficient super-resolution model that leverages pre-trained weights and Normalizing Flow layers to reduce parameter count while maintaining performance. The second part: 1) An automated quality assessment system for agricultural produce using Conditional GANs to address class imbalance, data scarcity and annotation challenges, achieving good accuracy in seed purity testing; 2) An unsupervised geological mapping framework utilizing stacked autoencoders for dimensionality reduction, showing improved feature extraction compared to conventional methods; 3) We proposed a privacy preserving method for autonomous driving datasets using on face detection and image inpainting; 4) Utilizing Stable Diffusion based image inpainting for replacing the detected face and license plate to advancing privacy-preserving techniques and ethical considerations in the field.; and 5) An adapted diffusion model for art restoration that effectively handles multiple types of degradation through unified fine-tuning.

Paper Structure

This paper contains 148 sections, 4 theorems, 24 equations, 46 figures, 27 tables, 1 algorithm.

Key Result

Lemma 1

Let $y$ = $M\hat{x}$, $M$ is a lower triangular matrix with all diagonal entries $=K_{3,3}$

Figures (46)

  • Figure 1: Images interpolation inspired by work of this thesis: (a) regenerated image when using CInC Flow to change hair color, (b) remove glasses, (c) change visage shape. (d) Result of gradually modifying the age parameter, original image: middle.
  • Figure 2: (A) a learned Deconvolution, which provides an approximate reconstruction, and (B) a mathematical Inverse of a Convolution, which provides an exact reconstruction.
  • Figure 3: (a).Top: The first four are the kernel matrix, and the fifth is the input matrix with the standard padding that gives the bottom convolution matrix. Bottom: the convolution matrix corresponding to a convolution with a kernel of size three applied to an input of size $4\times4$, padded on both sides, and with two channels. Zero coefficients are drawn in white; other coefficients are drawn using the same color if applied to the same spatial location, albeit on different channels. (b) Top: an alternative padding scheme that results in a block triangular matrix $M$, Bottom: The matrix corresponding to a convolution with a kernel of size three applied to an input of size $4\times4$ padded only on one side and with two channels. (c) Top: an masked alternative padding scheme that results in a triangular matrix $M$, Bottom: the matrix corresponding to a convolution with a kernel of size three applied to an input of size $4\times4$ padded only on one side and with two channels. One of the weights of the kernel is masked. Note that the equivalent matrix $M$ is triangular.
  • Figure 4: CInC Flow nagar2021cinc comparison of the speed and utilization of parameters with Autoregressive convolutions and Emerging convolutions.
  • Figure 5: The Quad-coupling layer, each input block $X_i$ has the same spatial dimension as the input $X$ but only one-quarter of the channels. Each function $f_1$, $f_2$, and $f_3$ is a 3-layer convolutional network. $\bigoplus$ symbolizes a component-wise addition. The multiplicative actions are not represented here.
  • ...and 41 more figures

Theorems & Definitions (9)

  • Definition 1: Convolution
  • Definition 2: Padding
  • Definition 3: Matrix of a convolution.
  • Lemma 1
  • proof
  • Theorem 1: Characterization for $N=1$
  • proof
  • Theorem 2
  • Theorem 3