Table of Contents
Fetching ...

Reconstructing training data from document understanding models

Jérémie Dentan, Arnaud Paran, Aymen Shabou

TL;DR

This paper addresses privacy risks in layout-aware document understanding models by introducing CDMI, a white-box reconstruction attack that combines an autoregressive proxy with a token-level combinatorial optimization to reconstruct scrubbed fields from training data. It extends to an end-to-end attack by pairing CDMI with membership inference and introduces two new evaluation metrics to jointly assess reconstruction quality and membership inference performance. Empirical results on FUNSD and SROIE show that CDMI can perfectly reconstruct up to 4.1% of fields, rising to 22.5% when coupled with MI, with demonstrated memorization emerging early in training and contributing through both layout and visual modalities. The authors discuss defenses, emphasize the need for privacy-preserving designs in document understanding, and outline future directions for robust, privacy-aware multimodal models.

Abstract

Document understanding models are increasingly employed by companies to supplant humans in processing sensitive documents, such as invoices, tax notices, or even ID cards. However, the robustness of such models to privacy attacks remains vastly unexplored. This paper presents CDMI, the first reconstruction attack designed to extract sensitive fields from the training data of these models. We attack LayoutLM and BROS architectures, demonstrating that an adversary can perfectly reconstruct up to 4.1% of the fields of the documents used for fine-tuning, including some names, dates, and invoice amounts up to six-digit numbers. When our reconstruction attack is combined with a membership inference attack, our attack accuracy escalates to 22.5%. In addition, we introduce two new end-to-end metrics and evaluate our approach under various conditions: unimodal or bimodal data, LayoutLM or BROS backbones, four fine-tuning tasks, and two public datasets (FUNSD and SROIE). We also investigate the interplay between overfitting, predictive performance, and susceptibility to our attack. We conclude with a discussion on possible defenses against our attack and potential future research directions to construct robust document understanding models.

Reconstructing training data from document understanding models

TL;DR

This paper addresses privacy risks in layout-aware document understanding models by introducing CDMI, a white-box reconstruction attack that combines an autoregressive proxy with a token-level combinatorial optimization to reconstruct scrubbed fields from training data. It extends to an end-to-end attack by pairing CDMI with membership inference and introduces two new evaluation metrics to jointly assess reconstruction quality and membership inference performance. Empirical results on FUNSD and SROIE show that CDMI can perfectly reconstruct up to 4.1% of fields, rising to 22.5% when coupled with MI, with demonstrated memorization emerging early in training and contributing through both layout and visual modalities. The authors discuss defenses, emphasize the need for privacy-preserving designs in document understanding, and outline future directions for robust, privacy-aware multimodal models.

Abstract

Document understanding models are increasingly employed by companies to supplant humans in processing sensitive documents, such as invoices, tax notices, or even ID cards. However, the robustness of such models to privacy attacks remains vastly unexplored. This paper presents CDMI, the first reconstruction attack designed to extract sensitive fields from the training data of these models. We attack LayoutLM and BROS architectures, demonstrating that an adversary can perfectly reconstruct up to 4.1% of the fields of the documents used for fine-tuning, including some names, dates, and invoice amounts up to six-digit numbers. When our reconstruction attack is combined with a membership inference attack, our attack accuracy escalates to 22.5%. In addition, we introduce two new end-to-end metrics and evaluate our approach under various conditions: unimodal or bimodal data, LayoutLM or BROS backbones, four fine-tuning tasks, and two public datasets (FUNSD and SROIE). We also investigate the interplay between overfitting, predictive performance, and susceptibility to our attack. We conclude with a discussion on possible defenses against our attack and potential future research directions to construct robust document understanding models.
Paper Structure (57 sections, 7 equations, 8 figures, 5 tables, 2 algorithms)

This paper contains 57 sections, 7 equations, 8 figures, 5 tables, 2 algorithms.

Figures (8)

  • Figure 1: A document (licensed CC BY 4.0 DEED by huang_icdar2019_2019) where two fields are perfectly reconstructed by CDMI. A model with LayoutLM architecture xu_layoutlm_2019 is trained on SROIE dataset huang_icdar2019_2019. Then, when the date or the company is scrubbed, the adversary is able to reconstruct it.
  • Figure 2: Computing and maximizing $P_\theta^\text{tok}$ to invert a token. We first use a masked model trained on public data to select $N_c = 128$ candidates (A). Then, we compute the loss of the target model with each candidate (B), and aggregate these losses to obtain a probability distribution over the candidates (C, D, E). Finally, we sample the reconstructed token from this distribution (F), and repeat this process for the next tokens.
  • Figure 3: $AccAUC$ and $HamAAC$ computation examples. Acc($p$) denotes the mean accuracy of the top-$p$ fields the adversary is the most confident in. The greater it is, the more accurate the reconstructions are. A peaky and decreasing shape means that the membership inference metric accurately sorts the reconstruction attempts. This is why we seek to maximize its Area Under the Curve ($AccAUC$). Idem for the Area Above the Curve with the Hamming distance ($HamAAC$).
  • Figure 4: Factors influencing the performance of the one-shot attack on a LayoutLM with MLM task. We attack 8 different models (2 modality, 2 criteria, 2 datasets) with an average improvement factor of $\text{IpF} = 1.187$. Among them, the 4 attacks implying a bimodal model are more accurate than those against a unimodal model (IpF of $1.296$ vs. $1.078$). Similarly, the attacks are more accurate with the Precision criterion, and with the FUNSD dataset.
  • Figure 5: Performance comparison based on the backbone or task. The top graph shows the average performance of the attack with LayoutLM backbone, for the four possible tasks. The bottom graphs compare the average performance of the attack on the MLM models with LayoutLM backbone (in blue) or BROS backbone (in orange), for both the one-shot variant (left) and the multi-shot variant (right).
  • ...and 3 more figures