LDPKiT: Superimposing Remote Queries for Privacy-Preserving Local Model Training

Kexin Li; Aastha Mehta; David Lie

LDPKiT: Superimposing Remote Queries for Privacy-Preserving Local Model Training

Kexin Li, Aastha Mehta, David Lie

TL;DR

LDPKiT addresses privacy concerns in leveraging proprietary remote models by injecting $oldsymbol{ ext{$bepsilon$-LDP}}$ noise into private data and augmenting it with a two-layer data augmentation strategy. It introduces LDPKiT-Rand and LDPKiT-Sup to generate a larger, privacy-protected inference set ${ m D_{infer}}$ that enables effective knowledge transfer to a local model while bounding leakage. Empirically, LDPKiT-Sup consistently recovers most of the utility lost to LDP noise across SVHN, Fashion-MNIST, and PathMNIST, with latent-space analyses showing that superimposed samples align better with target distributions than random noise. The work also demonstrates data reconstruction risks are mitigated under the proposed privacy regime and discusses ethical considerations and practical implications for real-world use, highlighting that the extracted local models are non-competitive and non-stealthy. Overall, LDPKiT offers a practical, privacy-preserving pathway to label and learn from sensitive data using remote models, with stronger privacy guarantees yielding meaningful utility gains at scale.

Abstract

Users of modern Machine Learning (ML) cloud services face a privacy conundrum -- on one hand, they may have concerns about sending private data to the service for inference, but on the other hand, for specialized models, there may be no alternative but to use the proprietary model of the ML service. In this work, we present LDPKiT, a framework for non-adversarial, privacy-preserving model extraction that leverages a user's private in-distribution data while bounding privacy leakage. LDPKiT introduces a novel superimposition technique that generates approximately in-distribution samples, enabling effective knowledge transfer under local differential privacy (LDP). Experiments on Fashion-MNIST, SVHN, and PathMNIST demonstrate that LDPKiT consistently improves utility while maintaining privacy, with benefits that become more pronounced at stronger noise levels. For example, on SVHN, LDPKiT achieves nearly the same inference accuracy at $ε=1.25$ as at $ε=2.0$, yielding stronger privacy guarantees with less than a 2% accuracy reduction. We further conduct sensitivity analyses to examine the effect of dataset size on performance and provide a systematic analysis of latent space representations, offering theoretical insights into the accuracy gains of LDPKiT.

LDPKiT: Superimposing Remote Queries for Privacy-Preserving Local Model Training

TL;DR

LDPKiT addresses privacy concerns in leveraging proprietary remote models by injecting

bepsilon

noise into private data and augmenting it with a two-layer data augmentation strategy. It introduces LDPKiT-Rand and LDPKiT-Sup to generate a larger, privacy-protected inference set

that enables effective knowledge transfer to a local model while bounding leakage. Empirically, LDPKiT-Sup consistently recovers most of the utility lost to LDP noise across SVHN, Fashion-MNIST, and PathMNIST, with latent-space analyses showing that superimposed samples align better with target distributions than random noise. The work also demonstrates data reconstruction risks are mitigated under the proposed privacy regime and discusses ethical considerations and practical implications for real-world use, highlighting that the extracted local models are non-competitive and non-stealthy. Overall, LDPKiT offers a practical, privacy-preserving pathway to label and learn from sensitive data using remote models, with stronger privacy guarantees yielding meaningful utility gains at scale.

Abstract

as at

, yielding stronger privacy guarantees with less than a 2% accuracy reduction. We further conduct sensitivity analyses to examine the effect of dataset size on performance and provide a systematic analysis of latent space representations, offering theoretical insights into the accuracy gains of LDPKiT.

Paper Structure (38 sections, 3 theorems, 15 equations, 15 figures, 14 tables)

This paper contains 38 sections, 3 theorems, 15 equations, 15 figures, 14 tables.

Introduction
Overview
Motivation
Preliminaries and General Setup
Threat Model
Privacy Guarantee
Design
Preliminary Experiments
LDPKiT's Data Augmentation Mechanism
Evaluation
Experimental setup
RQ1: LDPKiT's Utility Recovery on ${\mathcal{D}_{\rm priv}}$
RQ2: Latent Space Analysis
RQ3: Sensitivity Analysis of the Impact of $|{\mathcal{D}_{\rm infer}}|$ and $|{\mathcal{D}_{\rm priv}}|$ on LDPKiT
Evaluation of Data Reconstruction Risks
...and 23 more sections

Key Result

Theorem A.1

Our base noise injection mechanism satisfies ${\epsilon}$-LDP where $\epsilon = \frac{\Delta_f}{\lambda}$

Figures (15)

Figure 1: Comparison of accuracies on SVHN and Fashion-MNIST: Extracting a local model (ResNet-18) with OOD public data samples versus querying the remote model (ResNet-152) with $\epsilon$-LDP protected private data.
Figure 2: LDPKiT system overview.
Figure 3: Example of a noised Fashion-MNIST data point with label 5 (sandal) and $\epsilon$ set to 1.5.
Figure 4: Inference accuracy comparisons on ${\mathcal{D}_{\rm priv}}$ with various $\epsilon$: ResNet-152 (${\mathcal{M}_{\rm R}}$)'s SIDP versus ${\mathcal{M}_{\rm L}}$ trained using LDPKiT-Rand and LDPKiT-Sup. The results show the rank of performance as LDPKiT-Sup > LDPKiT-Rand > SIDP
Figure 5: Inference accuracy comparisons of LDPKiT-Rand on ${\mathcal{D}_{\rm priv}}$ with base $\epsilon=1.5$ and various post-processing $\epsilon$.
...and 10 more figures

Theorems & Definitions (11)

Definition 2.1
Definition 2.2
Definition 2.3
Definition 4.1
Definition 4.2
Theorem A.1
proof
Theorem A.2
proof
Theorem A.3
...and 1 more

LDPKiT: Superimposing Remote Queries for Privacy-Preserving Local Model Training

TL;DR

Abstract

LDPKiT: Superimposing Remote Queries for Privacy-Preserving Local Model Training

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (11)