Table of Contents
Fetching ...

Synthesize Privacy-Preserving High-Resolution Images via Private Textual Intermediaries

Haoxiang Wang, Zinan Lin, Da Yu, Huishuai Zhang

TL;DR

Synthesis via Private Textual Intermediaries (SPTI) introduces an inference-only, DP-compliant pipeline that shifts privacy guarantees from high-dimensional images to the text domain by converting private images into captions, privately evolving these captions with a modified Private Evolution algorithm, and then generating high-resolution images from the evolved text using diffusion models. The key innovation is a cross-modal voting mechanism (Image Voting) that guides text evolution by evaluating the images produced from candidate texts against private data, all under $(\\epsilon,\\\delta)$-DP via Gaussian noise and adaptive composition. Empirically, SPTI achieves substantially better Fréchet Inception Distance (FID) scores than DP-finetuning and prior Private Evolution baselines on LSUN Bedroom and MM-CelebA-HQ at $\\epsilon=1.0$, while remaining compatible with proprietary API backends and avoiding model training. The framework demonstrates that text can serve as a universal, privacy-preserving interface for multimodal generation, enabling high-fidelity DP synthetic images with practical resource efficiency and broad applicability, though at the cost of higher compute overhead and potential domain-generalization limits. Overall, SPTI offers a scalable, API-friendly path to private visual data sharing and downstream analysis by privatizing the narrative (text) rather than the pixel domain.

Abstract

Generating high fidelity, differentially private (DP) synthetic images offers a promising route to share and analyze sensitive visual data without compromising individual privacy. However, existing DP image synthesis methods struggle to produce high resolution outputs that faithfully capture the structure of the original data. In this paper, we introduce a novel method, referred to as Synthesis via Private Textual Intermediaries (SPTI), that can generate high resolution DP images with easy adoption. The key idea is to shift the challenge of DP image synthesis from the image domain to the text domain by leveraging state of the art DP text generation methods. SPTI first summarizes each private image into a concise textual description using image to text models, then applies a modified Private Evolution algorithm to generate DP text, and finally reconstructs images using text to image models. Notably, SPTI requires no model training, only inference with off the shelf models. Given a private dataset, SPTI produces synthetic images of substantially higher quality than prior DP approaches. On the LSUN Bedroom dataset, SPTI attains an FID equal to 26.71 under epsilon equal to 1.0, improving over Private Evolution FID of 40.36. Similarly, on MM CelebA HQ, SPTI achieves an FID equal to 33.27 at epsilon equal to 1.0, compared to 57.01 from DP fine tuning baselines. Overall, our results demonstrate that Synthesis via Private Textual Intermediaries provides a resource efficient and proprietary model compatible framework for generating high resolution DP synthetic images, greatly expanding access to private visual datasets.

Synthesize Privacy-Preserving High-Resolution Images via Private Textual Intermediaries

TL;DR

Synthesis via Private Textual Intermediaries (SPTI) introduces an inference-only, DP-compliant pipeline that shifts privacy guarantees from high-dimensional images to the text domain by converting private images into captions, privately evolving these captions with a modified Private Evolution algorithm, and then generating high-resolution images from the evolved text using diffusion models. The key innovation is a cross-modal voting mechanism (Image Voting) that guides text evolution by evaluating the images produced from candidate texts against private data, all under -DP via Gaussian noise and adaptive composition. Empirically, SPTI achieves substantially better Fréchet Inception Distance (FID) scores than DP-finetuning and prior Private Evolution baselines on LSUN Bedroom and MM-CelebA-HQ at , while remaining compatible with proprietary API backends and avoiding model training. The framework demonstrates that text can serve as a universal, privacy-preserving interface for multimodal generation, enabling high-fidelity DP synthetic images with practical resource efficiency and broad applicability, though at the cost of higher compute overhead and potential domain-generalization limits. Overall, SPTI offers a scalable, API-friendly path to private visual data sharing and downstream analysis by privatizing the narrative (text) rather than the pixel domain.

Abstract

Generating high fidelity, differentially private (DP) synthetic images offers a promising route to share and analyze sensitive visual data without compromising individual privacy. However, existing DP image synthesis methods struggle to produce high resolution outputs that faithfully capture the structure of the original data. In this paper, we introduce a novel method, referred to as Synthesis via Private Textual Intermediaries (SPTI), that can generate high resolution DP images with easy adoption. The key idea is to shift the challenge of DP image synthesis from the image domain to the text domain by leveraging state of the art DP text generation methods. SPTI first summarizes each private image into a concise textual description using image to text models, then applies a modified Private Evolution algorithm to generate DP text, and finally reconstructs images using text to image models. Notably, SPTI requires no model training, only inference with off the shelf models. Given a private dataset, SPTI produces synthetic images of substantially higher quality than prior DP approaches. On the LSUN Bedroom dataset, SPTI attains an FID equal to 26.71 under epsilon equal to 1.0, improving over Private Evolution FID of 40.36. Similarly, on MM CelebA HQ, SPTI achieves an FID equal to 33.27 at epsilon equal to 1.0, compared to 57.01 from DP fine tuning baselines. Overall, our results demonstrate that Synthesis via Private Textual Intermediaries provides a resource efficient and proprietary model compatible framework for generating high resolution DP synthetic images, greatly expanding access to private visual datasets.

Paper Structure

This paper contains 44 sections, 3 equations, 41 figures, 9 tables, 2 algorithms.

Figures (41)

  • Figure 1: Overview of the Synthesis via Private Textual Intermediaries (SPTI) framework for differentially private (DP) synthetic data generation. Top left: DP fine-tuning framework. A pretrained model is fine-tuned on private data under DP constraints, and the resulting model is used to generate DP synthetic data. Top right: Private Evolution (PE) on images. This method begins by randomly generating candidate samples, which are then compared to private data. Samples are selected based on a voting mechanism and further perturbed to produce a new generation of samples. Bottom: Synthesis via Private Textual Intermediaries (SPTI) framework. Private image data is served as reference. A modified Augmented Private Evolution (Aug-PE) method xie2024differentially is then applied to generate DP synthetic text data, which is subsequently transformed into DP synthetic images using a diffusion model API.
  • Figure 2: DP-synthetic images from dataset European Art ($\epsilon=1.0$).
  • Figure 3: DP-synthetic images from dataset Wave-ui-25k ($\epsilon=1.0$).
  • Figure 4: DP-synthetic images from dataset LSUN Bedroom ($\epsilon=1.0$).
  • Figure 5: Experiment results on LSUN Bedroom dataset
  • ...and 36 more figures