Table of Contents
Fetching ...

SpinML: Customized Synthetic Data Generation for Private Training of Specialized ML Models

Jiang Zhang, Rohan Xavier Sequeira, Konstantinos Psounis

TL;DR

SpinML addresses the challenge of privately training user-specific vision tasks when public labeled data is scarce. It achieves this by on-device object-level sanitization of reference images and server-side fine-tuning of diffusion models (DreamBooth/ControlNet) to generate customized synthetic data, which is then used to train a device-resident model. The approach provides a tunable privacy-utility trade-off via L0/L1/L2 sanitization schemes at the object level, quantified by MI and SIM privacy metrics, and evaluated on three real-world tasks. Results show that selective sharing (especially L2) can boost utility while preserving user-specified privacy preferences, with insights into when and how to mix sanitization schemes for optimal practical impact.

Abstract

Specialized machine learning (ML) models tailored to users needs and requests are increasingly being deployed on smart devices with cameras, to provide personalized intelligent services taking advantage of camera data. However, two primary challenges hinder the training of such models: the lack of publicly available labeled data suitable for specialized tasks and the inaccessibility of labeled private data due to concerns about user privacy. To address these challenges, we propose a novel system SpinML, where the server generates customized Synthetic image data to Privately traIN a specialized ML model tailored to the user request, with the usage of only a few sanitized reference images from the user. SpinML offers users fine-grained, object-level control over the reference images, which allows user to trade between the privacy and utility of the generated synthetic data according to their privacy preferences. Through experiments on three specialized model training tasks, we demonstrate that our proposed system can enhance the performance of specialized models without compromising users privacy preferences.

SpinML: Customized Synthetic Data Generation for Private Training of Specialized ML Models

TL;DR

SpinML addresses the challenge of privately training user-specific vision tasks when public labeled data is scarce. It achieves this by on-device object-level sanitization of reference images and server-side fine-tuning of diffusion models (DreamBooth/ControlNet) to generate customized synthetic data, which is then used to train a device-resident model. The approach provides a tunable privacy-utility trade-off via L0/L1/L2 sanitization schemes at the object level, quantified by MI and SIM privacy metrics, and evaluated on three real-world tasks. Results show that selective sharing (especially L2) can boost utility while preserving user-specified privacy preferences, with insights into when and how to mix sanitization schemes for optimal practical impact.

Abstract

Specialized machine learning (ML) models tailored to users needs and requests are increasingly being deployed on smart devices with cameras, to provide personalized intelligent services taking advantage of camera data. However, two primary challenges hinder the training of such models: the lack of publicly available labeled data suitable for specialized tasks and the inaccessibility of labeled private data due to concerns about user privacy. To address these challenges, we propose a novel system SpinML, where the server generates customized Synthetic image data to Privately traIN a specialized ML model tailored to the user request, with the usage of only a few sanitized reference images from the user. SpinML offers users fine-grained, object-level control over the reference images, which allows user to trade between the privacy and utility of the generated synthetic data according to their privacy preferences. Through experiments on three specialized model training tasks, we demonstrate that our proposed system can enhance the performance of specialized models without compromising users privacy preferences.

Paper Structure

This paper contains 30 sections, 2 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Problem statement. The user sends a request about the model they need and a few reference images. The server automatically train a model for the user.
  • Figure 2: System details of SpinML. The user side consists of two modules: an object detection and segmentation module and an image sanitizer. The server side consists of three modules: a DM fine-tuning module, a synthetic data generation module, and a training module for a device-side ML model.
  • Figure 3: Privacy-utility trade-off results. Note that privacy leakage is measured by MI and SIM, and we report leakage w.r.t. Target object and Background separately. The model utility represents the performance of the specialized model trained on synthetic data. The top-left part of these figures indicates both higher privacy and higher utility. Note that lines of a certain color, when present, connecting various points in the graphs illustrate the effect on the privacy-utility trade-off by fixing either the target object or background privacy preference while varying the other.
  • Figure 4: Visualization results of Husky dataset.
  • Figure 5: Visualization results of Human dataset.
  • ...and 1 more figures