When Privacy Meets Recovery: The Overlooked Half of Surrogate-Driven Privacy Preservation for MLLM Editing
Siyuan Xu, Yibing Liu, Peilin Chen, Yung-Hui Li, Shiqi Wang, Sam Kwong
TL;DR
This work tackles the overlooked problem of recovering surrogate-driven edits in privacy-preserving MLLM workflows. It introduces SPPE, a comprehensive dataset for evaluating edit fidelity under surrogate-based privacy, and SOER, a DiT-based multimodal framework that reconstructs MLLM-edited outputs on original content while preserving privacy. The approach integrates semantic, visual, and spatial guidance with region-weighted losses, achieving superior edit fidelity and privacy preservation on SPPE and InstructPix2Pix. The results demonstrate robust generalization across diverse content and editing tasks, offering a practical path for privacy-aware MLLM applications.
Abstract
Privacy leakage in Multimodal Large Language Models (MLLMs) has long been an intractable problem. Existing studies, though effectively obscure private information in MLLMs, often overlook the evaluation of the authenticity and recovery quality of user privacy. To this end, this work uniquely focuses on the critical challenge of how to restore surrogate-driven protected data in diverse MLLM scenarios. We first bridge this research gap by contributing the SPPE (Surrogate Privacy Protected Editable) dataset, which includes a wide range of privacy categories and user instructions to simulate real MLLM applications. This dataset offers protected surrogates alongside their various MLLM-edited versions, thus enabling the direct assessment of privacy recovery quality. By formulating privacy recovery as a guided generation task conditioned on complementary multimodal signals, we further introduce a unified approach that reliably reconstructs private content while preserving the fidelity of MLLM-generated edits. The experiments on both SPPE and InstructPix2Pix further show that our approach generalizes well across diverse visual content and editing tasks, achieving a strong balance between privacy protection and MLLM usability.
