Table of Contents
Fetching ...

Latent Regularization in Generative Test Input Generation

Giorgi Merabishvili, Oliver Weißl, Andrea Stocco

TL;DR

This paper studies how latent-space truncation in StyleGAN-based test-input generators affects the quality of inputs for deep learning classifiers, focusing on validity, diversity, and fault detection. It introduces adaptive truncation to salvage seeds near decision boundaries and compares two probing strategies: truncation-only first-flip and truncation-assisted style mixing, evaluated on MNIST, Fashion-MNIST, and CIFAR-10 with pretrained generators and Torchvision SUTs. Results show mild–moderate regularization around $\psi \approx 0.6$ maximizes human-validated, fault-revealing frontier yield, with adaptive salvage significantly reducing annotation and computation costs; first-flip offers fast fault discovery while style-mixing affords semantic boundary control. The work provides practical guidance for configuring generative testing pipelines and discusses extending the approach to other generator families, such as diffusion or transformer-based models, to balance fidelity and diversity in broader domains.

Abstract

This study investigates the impact of regularization of latent spaces through truncation on the quality of generated test inputs for deep learning classifiers. We evaluate this effect using style-based GANs, a state-of-the-art generative approach, and assess quality along three dimensions: validity, diversity, and fault detection. We evaluate our approach on the boundary testing of deep learning image classifiers across three datasets, MNIST, Fashion MNIST, and CIFAR-10. We compare two truncation strategies: latent code mixing with binary search optimization and random latent truncation for generative exploration. Our experiments show that the latent code-mixing approach yields a higher fault detection rate than random truncation, while also improving both diversity and validity.

Latent Regularization in Generative Test Input Generation

TL;DR

This paper studies how latent-space truncation in StyleGAN-based test-input generators affects the quality of inputs for deep learning classifiers, focusing on validity, diversity, and fault detection. It introduces adaptive truncation to salvage seeds near decision boundaries and compares two probing strategies: truncation-only first-flip and truncation-assisted style mixing, evaluated on MNIST, Fashion-MNIST, and CIFAR-10 with pretrained generators and Torchvision SUTs. Results show mild–moderate regularization around maximizes human-validated, fault-revealing frontier yield, with adaptive salvage significantly reducing annotation and computation costs; first-flip offers fast fault discovery while style-mixing affords semantic boundary control. The work provides practical guidance for configuring generative testing pipelines and discusses extending the approach to other generator families, such as diffusion or transformer-based models, to balance fidelity and diversity in broader domains.

Abstract

This study investigates the impact of regularization of latent spaces through truncation on the quality of generated test inputs for deep learning classifiers. We evaluate this effect using style-based GANs, a state-of-the-art generative approach, and assess quality along three dimensions: validity, diversity, and fault detection. We evaluate our approach on the boundary testing of deep learning image classifiers across three datasets, MNIST, Fashion MNIST, and CIFAR-10. We compare two truncation strategies: latent code mixing with binary search optimization and random latent truncation for generative exploration. Our experiments show that the latent code-mixing approach yields a higher fault detection rate than random truncation, while also improving both diversity and validity.
Paper Structure (20 sections, 2 equations, 8 figures, 1 table, 2 algorithms)

This paper contains 20 sections, 2 equations, 8 figures, 1 table, 2 algorithms.

Figures (8)

  • Figure 1: Truncation on coarse layers. A fixed latent seed at $\psi_T=1.0$ and progressively lower $\psi_T$ values increase fidelity and reduce diversity.
  • Figure 2: Test input generation with truncation workflow.
  • Figure 3: Truncation-only search. Truncation refines a human-invalid baseline into a human-valid image and flips the SUT prediction.
  • Figure 4: Minimal truncation $\psi^\star\xspace$ in MNIST ($n=25$).
  • Figure 5: Target revelation via truncation. Lowering $\psi\xspace$ causes human-valid seed to flip under truncation alone.
  • ...and 3 more figures