Table of Contents
Fetching ...

Latent Space Class Dispersion: Effective Test Data Quality Assessment for DNNs

Vivek Vekariya, Mojdeh Golagha, Andrea Stocco, Alexander Pretschner

TL;DR

This work introduces Latent Space Class Dispersion (LSCD), a latent-space-based metric for quantifying DNN test data quality as a scalable alternative to costly mutation testing. LSCD computes class-wise dispersion around training centroids in the latent space, providing a robust and efficient indicator of a dataset's ability to probe generalization and fault detection. Across MNIST, SVHN, and GTSRB with diverse architectures, LSCD shows a strong positive correlation with Mutation Score (MS) while far outperforming Distance-based Surprise Coverage (DSC) in both statistical strength (MS-LSCD ≈ 0.87 vs MS-DSC ≈ 0.25) and computational efficiency (LSCD ≈ 36.6x faster than DSC). The study also demonstrates that corner-case data generated via Coverage-Guided Fuzzing have high validity and further improve LSCD and MS, underscoring LSCD's practical utility as a cost-effective proxy for MS in test data quality assessment and DNN reliability evaluation.

Abstract

High-quality test datasets are crucial for assessing the reliability of Deep Neural Networks (DNNs). Mutation testing evaluates test dataset quality based on their ability to uncover injected faults in DNNs as measured by mutation score (MS). At the same time, its high computational cost motivates researchers to seek alternative test adequacy criteria. We propose Latent Space Class Dispersion (LSCD), a novel metric to quantify the quality of test datasets for DNNs. It measures the degree of dispersion within a test dataset as observed in the latent space of a DNN. Our empirical study shows that LSCD reveals and quantifies deficiencies in the test dataset of three popular benchmarks pertaining to image classification tasks using DNNs. Corner cases generated using automated fuzzing were found to help enhance fault detection and improve the overall quality of the original test sets calculated by MS and LSCD. Our experiments revealed a high positive correlation (0.87) between LSCD and MS, significantly higher than the one achieved by the well-studied Distance-based Surprise Coverage (0.25). These results were obtained from 129 mutants generated through pre-training mutation operators, with statistical significance and a high validity of corner cases. These observations suggest that LSCD can serve as a cost-effective alternative to expensive mutation testing, eliminating the need to generate mutant models while offering comparably valuable insights into test dataset quality for DNNs.

Latent Space Class Dispersion: Effective Test Data Quality Assessment for DNNs

TL;DR

This work introduces Latent Space Class Dispersion (LSCD), a latent-space-based metric for quantifying DNN test data quality as a scalable alternative to costly mutation testing. LSCD computes class-wise dispersion around training centroids in the latent space, providing a robust and efficient indicator of a dataset's ability to probe generalization and fault detection. Across MNIST, SVHN, and GTSRB with diverse architectures, LSCD shows a strong positive correlation with Mutation Score (MS) while far outperforming Distance-based Surprise Coverage (DSC) in both statistical strength (MS-LSCD ≈ 0.87 vs MS-DSC ≈ 0.25) and computational efficiency (LSCD ≈ 36.6x faster than DSC). The study also demonstrates that corner-case data generated via Coverage-Guided Fuzzing have high validity and further improve LSCD and MS, underscoring LSCD's practical utility as a cost-effective proxy for MS in test data quality assessment and DNN reliability evaluation.

Abstract

High-quality test datasets are crucial for assessing the reliability of Deep Neural Networks (DNNs). Mutation testing evaluates test dataset quality based on their ability to uncover injected faults in DNNs as measured by mutation score (MS). At the same time, its high computational cost motivates researchers to seek alternative test adequacy criteria. We propose Latent Space Class Dispersion (LSCD), a novel metric to quantify the quality of test datasets for DNNs. It measures the degree of dispersion within a test dataset as observed in the latent space of a DNN. Our empirical study shows that LSCD reveals and quantifies deficiencies in the test dataset of three popular benchmarks pertaining to image classification tasks using DNNs. Corner cases generated using automated fuzzing were found to help enhance fault detection and improve the overall quality of the original test sets calculated by MS and LSCD. Our experiments revealed a high positive correlation (0.87) between LSCD and MS, significantly higher than the one achieved by the well-studied Distance-based Surprise Coverage (0.25). These results were obtained from 129 mutants generated through pre-training mutation operators, with statistical significance and a high validity of corner cases. These observations suggest that LSCD can serve as a cost-effective alternative to expensive mutation testing, eliminating the need to generate mutant models while offering comparably valuable insights into test dataset quality for DNNs.

Paper Structure

This paper contains 32 sections, 9 equations, 4 figures, 8 tables.

Figures (4)

  • Figure 1: Illustration of DSC and LSCD calculation. Black dots represent latent space vectors from the training set for classes 1 and 2 with their centers C$_1$ and C$_2$. New test inputs X$_1$ (correctly classified) and X$_2$ (falsely classified), both belonging to class 1, are shown to demonstrate the metric calculations.
  • Figure 2: Our framework to evaluate test dataset quality based on Distance-based Surprise Coverage and Latent Space Class Dispersion (pipeline highlighted in green blocks). We compare our approach with the Mutation Score (highlighted in the pipeline with yellow colored blocks).
  • Figure 3: Coverage-Guided Fuzzing
  • Figure 4: LSCD (a-b) and DSC (b-c) v/s MS values across all mutant models.