Table of Contents
Fetching ...

Rep2Text: Decoding Full Text from a Single LLM Token Representation

Haiyan Zhao, Zirui He, Fan Yang, Ali Payani, Mengnan Du

TL;DR

Rep2Text investigates whether a single last-token representation from a decoder-only LLM preserves enough information to reconstruct the original input text. It introduces an adapter-based inverter that maps the target model's last-token state to the embedding space of a decoding LLM, which then autoregressively reconstructs the text. Across multiple model pairs and $16$-token inputs, the approach recovers roughly half of the information with strong structural and semantic coherence, though performance degrades for longer sequences and varies by model. The study also shows partial generalization to out-of-distribution clinical notes, highlighting both representational leakage risks and practical insights into how input information is organized across LLM layers.

Abstract

Large language models (LLMs) have achieved remarkable progress across diverse tasks, yet their internal mechanisms remain largely opaque. In this work, we address a fundamental question: to what extent can the original input text be recovered from a single last-token representation within an LLM? We propose Rep2Text, a novel framework for decoding full text from last-token representations. Rep2Text employs a trainable adapter that projects a target model's internal representations into the embedding space of a decoding language model, which then autoregressively reconstructs the input text. Experiments on various model combinations (Llama-3.1-8B, Gemma-7B, Mistral-7B-v0.1, Llama-3.2-3B) demonstrate that, on average, over half of the information in 16-token sequences can be recovered from this compressed representation while maintaining strong semantic integrity and coherence. Furthermore, our analysis reveals an information bottleneck effect: longer sequences exhibit decreased token-level recovery while preserving strong semantic integrity. Besides, our framework also demonstrates robust generalization to out-of-distribution medical data.

Rep2Text: Decoding Full Text from a Single LLM Token Representation

TL;DR

Rep2Text investigates whether a single last-token representation from a decoder-only LLM preserves enough information to reconstruct the original input text. It introduces an adapter-based inverter that maps the target model's last-token state to the embedding space of a decoding LLM, which then autoregressively reconstructs the text. Across multiple model pairs and -token inputs, the approach recovers roughly half of the information with strong structural and semantic coherence, though performance degrades for longer sequences and varies by model. The study also shows partial generalization to out-of-distribution clinical notes, highlighting both representational leakage risks and practical insights into how input information is organized across LLM layers.

Abstract

Large language models (LLMs) have achieved remarkable progress across diverse tasks, yet their internal mechanisms remain largely opaque. In this work, we address a fundamental question: to what extent can the original input text be recovered from a single last-token representation within an LLM? We propose Rep2Text, a novel framework for decoding full text from last-token representations. Rep2Text employs a trainable adapter that projects a target model's internal representations into the embedding space of a decoding language model, which then autoregressively reconstructs the input text. Experiments on various model combinations (Llama-3.1-8B, Gemma-7B, Mistral-7B-v0.1, Llama-3.2-3B) demonstrate that, on average, over half of the information in 16-token sequences can be recovered from this compressed representation while maintaining strong semantic integrity and coherence. Furthermore, our analysis reveals an information bottleneck effect: longer sequences exhibit decreased token-level recovery while preserving strong semantic integrity. Besides, our framework also demonstrates robust generalization to out-of-distribution medical data.

Paper Structure

This paper contains 30 sections, 6 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Overview of Rep2Text. The last-token representation obtained from the $l$-th layer of the target model $\mathcal{M}$ is projected into the embedding space of the decoding model $\mathcal{M}^\prime$ via the adapter. The projected embeddings, together with those of the system and the user prompts, are then fed into the decoding model to reconstruct the corresponding text sequence.
  • Figure 2: Examples of structure and entity similarity. Darker colors indicate higher similarity score.
  • Figure 3: Performance comparison of inverting varying length of sequences.
  • Figure 4: Performance comparison with layerwise representation inversions.
  • Figure 5: The score distribution on OOD clinical notes. The mean score obtained by Llama-3.1-8B wen used as both the target and decoding model (Table \ref{['tab:model']}), serves as the threshold for assessing the inverter's capability to recover OOD data.
  • ...and 2 more figures