Perturbation-Assisted Sample Synthesis: A Novel Approach for Uncertainty Quantification

Yifei Liu; Rex Shen; Xiaotong Shen

Perturbation-Assisted Sample Synthesis: A Novel Approach for Uncertainty Quantification

Yifei Liu, Rex Shen, Xiaotong Shen

TL;DR

This paper introduces a novel Perturbation-Assisted Inference framework utilizing synthetic data generated by the Perturbation-Assisted Sample Synthesis method, and demonstrates the effectiveness of PAI in advancing uncertainty quantification in complex, data-driven tasks by applying it to diverse areas such as image synthesis, sentiment word analysis, multimodal inference, and the construction of prediction intervals.

Abstract

This paper introduces a novel Perturbation-Assisted Inference (PAI) framework utilizing synthetic data generated by the Perturbation-Assisted Sample Synthesis (PASS) method. The framework focuses on uncertainty quantification in complex data scenarios, particularly involving unstructured data while utilizing deep learning models. On one hand, PASS employs a generative model to create synthetic data that closely mirrors raw data while preserving its rank properties through data perturbation, thereby enhancing data diversity and bolstering privacy. By incorporating knowledge transfer from large pre-trained generative models, PASS enhances estimation accuracy, yielding refined distributional estimates of various statistics via Monte Carlo experiments. On the other hand, PAI boasts its statistically guaranteed validity. In pivotal inference, it enables precise conclusions even without prior knowledge of the pivotal's distribution. In non-pivotal situations, we enhance the reliability of synthetic data generation by training it with an independent holdout sample. We demonstrate the effectiveness of PAI in advancing uncertainty quantification in complex, data-driven tasks by applying it to diverse areas such as image synthesis, sentiment word analysis, multimodal inference, and the construction of prediction intervals.

Perturbation-Assisted Sample Synthesis: A Novel Approach for Uncertainty Quantification

TL;DR

Abstract

Paper Structure (30 sections, 11 theorems, 42 equations, 15 figures, 5 tables, 4 algorithms)

This paper contains 30 sections, 11 theorems, 42 equations, 15 figures, 5 tables, 4 algorithms.

Introduction
Perturbation-Assisted Sample Synthesis
Sample Synthesis
Data-Generating Distribution
Sampling Properties of PASS
Perturbation-Assisted Inference
Statistical guarantee and justification
General Inference with Holdout
Pivotal Inference without Holdout
Applications
Image Synthesis
Sentiment Word Inference
Text-to-Image Generation
Numerical Results
Image synthesis
...and 15 more sections

Key Result

Lemma 1

(Sampling properties of PASS) Given $\bm Z^{\prime}=(\bm Z^{\prime}_i)_{i=1}^n$ generated from DPG using $\tilde{G}$, assume that $\tilde{F}_{\bm Z}$ is independent of $\bm Z = (\bm Z_i)_{i = 1}^n$. Then,

Figures (15)

Figure 1: Flowchart illustrating the PASS approach with rank matching and distribution-preserving perturbation. PASS generates a synthetic sample that closely retains the multivariate ranks of the original sample, ensuring privacy protection. The transport $G$ is applied to align the base distribution with the target distribution (for example, the original distribution).
Figure 2: Estimating the distribution of the test statistic under the null hypothesis ($H_0$) through Perturbation-Assisted Inference (PAI) using the PASS generator: A Monte Carlo (MC) approach.
Figure 3: Illustration of assessing generative models using PAI. $d(\cdot, \cdot)$ represents distributional distance. A test statistic in the tails (red) suggests statistical evidence against the candidate model generating high-fidelity samples. Conversely, a test statistic near the mode (blue) indicates the opposite. For further details, see Algorithm \ref{['algorithm: eg1']} in the supplementary materials.
Figure 4: Illustration of the black-box test statistic dai2024significance employed for assessing feature significance within sentiment classification. If the tested words hold importance for the classification, the risk associated with the masked classifier is expected to be elevated.
Figure 5: Depiction of sentiment words inference using PAI. Words under test and their contextual surroundings are masked according to attention thresholds to compute the test statistic; detailed explanation in Section \ref{['ex:nlp']}. PAI operates within the embedding space formulated by DistilBERT; see Algorithm \ref{['algorithm: eg2']} in the supplementary materials for comprehensive steps.
...and 10 more figures

Theorems & Definitions (19)

Lemma 1
Theorem 1
Remark 1
Remark 2
Remark 3
Theorem 2
Definition 1: Population Rank Map
Remark 4
Definition 2: Empirical Rank Map
Remark 5
...and 9 more

Perturbation-Assisted Sample Synthesis: A Novel Approach for Uncertainty Quantification

TL;DR

Abstract

Perturbation-Assisted Sample Synthesis: A Novel Approach for Uncertainty Quantification

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (19)