Table of Contents
Fetching ...

Discrete Prompt Compression with Reinforcement Learning

Hoyoun Jung, Kyung-Joong Kim

TL;DR

This work introduces Prompt Compression with Reinforcement Learning (PCRL), a discrete prompt compression method that directly edits prompts using a lightweight policy without requiring LM gradients or labeled data. By framing compression as a single-step RL problem with a policy gradient and Self-critical Sequence Training, PCRL balances faithfulness (via ROUGE-L) and brevity (via compression ratio) to produce shorter, information-preserving prompts. The approach yields about 24.6% average token reduction across instruction prompts while maintaining output quality, and the learned policy demonstrates transferability to larger generation LMs, enabling practical applicability in black-box or API-based settings. Extensive experiments on Alpaca+ with GPT2-XL and FLAN-T5-XL, plus cross-LM transfer tests to LLaMa2, Falcon, FLAN-T5-XXL, and GPT-3.5, illustrate the method’s effectiveness and its potential to generalize across architectures, alongside insights into which tokens are most expendable.

Abstract

Compressed prompts aid instruction-tuned language models (LMs) in overcoming context window limitations and reducing computational costs. Existing methods, which primarily based on training embeddings, face various challenges associated with interpretability, the fixed number of embedding tokens, reusability across different LMs, and inapplicability when interacting with black-box APIs. This study proposes prompt compression with reinforcement learning (PCRL), which is a discrete prompt compression method that addresses these issues. The proposed PCRL method utilizes a computationally efficient policy network that edits prompts directly. The training approach employed in the proposed PCRLs can be applied flexibly to various types of LMs, including both decoder-only and encoder-decoder architecture and it can be trained without gradient access to the LMs or labeled data. The proposed PCRL achieves an average reduction of 24.6% in terms of the token count across various instruction prompts while maintaining sufficient performance. In addition, we demonstrate that the learned policy can be transferred to larger LMs, and through a comprehensive analysis, we explore the token importance within the prompts. Our code is accessible at https://github.com/nenomigami/PromptCompressor.

Discrete Prompt Compression with Reinforcement Learning

TL;DR

This work introduces Prompt Compression with Reinforcement Learning (PCRL), a discrete prompt compression method that directly edits prompts using a lightweight policy without requiring LM gradients or labeled data. By framing compression as a single-step RL problem with a policy gradient and Self-critical Sequence Training, PCRL balances faithfulness (via ROUGE-L) and brevity (via compression ratio) to produce shorter, information-preserving prompts. The approach yields about 24.6% average token reduction across instruction prompts while maintaining output quality, and the learned policy demonstrates transferability to larger generation LMs, enabling practical applicability in black-box or API-based settings. Extensive experiments on Alpaca+ with GPT2-XL and FLAN-T5-XL, plus cross-LM transfer tests to LLaMa2, Falcon, FLAN-T5-XXL, and GPT-3.5, illustrate the method’s effectiveness and its potential to generalize across architectures, alongside insights into which tokens are most expendable.

Abstract

Compressed prompts aid instruction-tuned language models (LMs) in overcoming context window limitations and reducing computational costs. Existing methods, which primarily based on training embeddings, face various challenges associated with interpretability, the fixed number of embedding tokens, reusability across different LMs, and inapplicability when interacting with black-box APIs. This study proposes prompt compression with reinforcement learning (PCRL), which is a discrete prompt compression method that addresses these issues. The proposed PCRL method utilizes a computationally efficient policy network that edits prompts directly. The training approach employed in the proposed PCRLs can be applied flexibly to various types of LMs, including both decoder-only and encoder-decoder architecture and it can be trained without gradient access to the LMs or labeled data. The proposed PCRL achieves an average reduction of 24.6% in terms of the token count across various instruction prompts while maintaining sufficient performance. In addition, we demonstrate that the learned policy can be transferred to larger LMs, and through a comprehensive analysis, we explore the token importance within the prompts. Our code is accessible at https://github.com/nenomigami/PromptCompressor.
Paper Structure (25 sections, 12 equations, 3 figures, 9 tables)

This paper contains 25 sections, 12 equations, 3 figures, 9 tables.

Figures (3)

  • Figure 1: Overall training procedure of PCRL. A prompt is sampled from the prompt pool, edited by the compression policy, and evaluated by comparing the generation LM's response to the original and edited prompt. The resulting reward is used for policy updates.
  • Figure 2: The policy network of PCRL. When a tokenized prompt is inputted, the network outputs an include/exclude probability for each token. If a token is the part of a statement, the exclude action is masked out.
  • Figure 3: Experimental results for the instruction prompts (Section 4.1). We performed five experiments on each validation set using various random seeds and computed the evaluation metrics. Error bars on orange bars indicate 95% confidence intervals.