Table of Contents
Fetching ...

InvBERT: Reconstructing Text from Contextualized Word Embeddings by inverting the BERT pipeline

Kai Kugler, Simon Münker, Johannes Höhmann, Achim Rettinger

TL;DR

The paper analyzes whether contextualized word embeddings (CTEs) derived from transformer models can safely replace copyrighted texts as Derived Text Formats (DTFs). It formalizes white-box, gray-box, and black-box attack scenarios and develops two inversion methods, InvBERT Classify and InvBERT Seq2Seq, to reconstruct original text from CTEs. Empirical results on AO3 and Gutenberg datasets show that, under realistic data access, reconstruction can be highly accurate (up to ~97%), with notable degradation only at very small training sets, implying that publishing CTEs as DTFs poses copyright risks. The study concludes that none of the examined scenarios fully safeguards against text reconstruction and highlights the need for copyright-aware strategies and further research with larger models and different defense mechanisms.

Abstract

Digital Humanities and Computational Literary Studies apply text mining methods to investigate literature. Such automated approaches enable quantitative studies on large corpora which would not be feasible by manual inspection alone. However, due to copyright restrictions, the availability of relevant digitized literary works is limited. Derived Text Formats (DTFs) have been proposed as a solution. Here, textual materials are transformed in such a way that copyright-critical features are removed, but that the use of certain analytical methods remains possible. Contextualized word embeddings produced by transformer-encoders (like BERT) are promising candidates for DTFs because they allow for state-of-the-art performance on various analytical tasks and, at first sight, do not disclose the original text. However, in this paper we demonstrate that under certain conditions the reconstruction of the original copyrighted text becomes feasible and its publication in the form of contextualized token representations is not safe. Our attempts to invert BERT suggest, that publishing the encoder as a black box together with the contextualized embeddings is critical, since it allows to generate data to train a decoder with a reconstruction accuracy sufficient to violate copyright laws.

InvBERT: Reconstructing Text from Contextualized Word Embeddings by inverting the BERT pipeline

TL;DR

The paper analyzes whether contextualized word embeddings (CTEs) derived from transformer models can safely replace copyrighted texts as Derived Text Formats (DTFs). It formalizes white-box, gray-box, and black-box attack scenarios and develops two inversion methods, InvBERT Classify and InvBERT Seq2Seq, to reconstruct original text from CTEs. Empirical results on AO3 and Gutenberg datasets show that, under realistic data access, reconstruction can be highly accurate (up to ~97%), with notable degradation only at very small training sets, implying that publishing CTEs as DTFs poses copyright risks. The study concludes that none of the examined scenarios fully safeguards against text reconstruction and highlights the need for copyright-aware strategies and further research with larger models and different defense mechanisms.

Abstract

Digital Humanities and Computational Literary Studies apply text mining methods to investigate literature. Such automated approaches enable quantitative studies on large corpora which would not be feasible by manual inspection alone. However, due to copyright restrictions, the availability of relevant digitized literary works is limited. Derived Text Formats (DTFs) have been proposed as a solution. Here, textual materials are transformed in such a way that copyright-critical features are removed, but that the use of certain analytical methods remains possible. Contextualized word embeddings produced by transformer-encoders (like BERT) are promising candidates for DTFs because they allow for state-of-the-art performance on various analytical tasks and, at first sight, do not disclose the original text. However, in this paper we demonstrate that under certain conditions the reconstruction of the original copyrighted text becomes feasible and its publication in the form of contextualized token representations is not safe. Our attempts to invert BERT suggest, that publishing the encoder as a black box together with the contextualized embeddings is critical, since it allows to generate data to train a decoder with a reconstruction accuracy sufficient to violate copyright laws.

Paper Structure

This paper contains 19 sections, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Sample text reconstruction to a Harry Potter quote rowling1998stone by inverting BERT.
  • Figure 2: Flowchart for each approach. Givens are enclosed in a dotted yellow area and attack-specific modules to be estimated are filled with orange. Data objects are highlighted in red, while green represent the evaluation/objective function.
  • Figure 3: Both reconstruction approaches compared by their in-domain sentence reconstruction accuracy.