Table of Contents
Fetching ...

Entropy-UID: A Method for Optimizing Information Density

Xinpeng Shou

TL;DR

This work addresses the challenge of balancing information density in autoregressive text generation by integrating entropy and Uniform Information Density (UID) into a unified decoding objective. The Entropy-UID method defines a token score $\text{Score}(s|C) = \alpha H(s|C) + (1 - \alpha) \text{Surprisal}(s|C)$ to jointly optimize global diversity and local coherence. Empirical results on WikiText-2, OpenWebText, and WMT show that Entropy-UID achieves lower entropy variance and stable surprisal compared with entropy-only, UID-only, and standard GPT-2 decoding, indicating more balanced and human-like outputs. This approach demonstrates the potential of information-theoretic constraints to improve decoding in autoregressive models and points toward broader applications in NLP tasks that require controlled information density.

Abstract

Balanced and efficient information flow is essential for optimizing language generation models. In this work, we propose Entropy-UID, a new token selection method that balances entropy and Uniform Information Density (UID) principles for enhanced efficiency of text generation. Our approach adaptively adjusts token selection by jointly minimizing entropy and surprisal, promoting more even information distribution across generated sequences. Theoretical validation demonstrates that Entropy-UID optimally reduces information spikes while maintaining fluency and coherence. The method has been evulated using information-theoretic metrics on multiple benchmark datasets, including WikiText-2, OpenWebText, and WMT. Experimental results show that Entropy-UID achieves lower surprisal and entropy variance compared to standard GPT-2 and alternative heuristics, leading to more balanced and human-like text generation. Our findings point towards the potential of leveraging information-theoretic constraints to refine token selection strategies in autoregressive language models.

Entropy-UID: A Method for Optimizing Information Density

TL;DR

This work addresses the challenge of balancing information density in autoregressive text generation by integrating entropy and Uniform Information Density (UID) into a unified decoding objective. The Entropy-UID method defines a token score to jointly optimize global diversity and local coherence. Empirical results on WikiText-2, OpenWebText, and WMT show that Entropy-UID achieves lower entropy variance and stable surprisal compared with entropy-only, UID-only, and standard GPT-2 decoding, indicating more balanced and human-like outputs. This approach demonstrates the potential of information-theoretic constraints to improve decoding in autoregressive models and points toward broader applications in NLP tasks that require controlled information density.

Abstract

Balanced and efficient information flow is essential for optimizing language generation models. In this work, we propose Entropy-UID, a new token selection method that balances entropy and Uniform Information Density (UID) principles for enhanced efficiency of text generation. Our approach adaptively adjusts token selection by jointly minimizing entropy and surprisal, promoting more even information distribution across generated sequences. Theoretical validation demonstrates that Entropy-UID optimally reduces information spikes while maintaining fluency and coherence. The method has been evulated using information-theoretic metrics on multiple benchmark datasets, including WikiText-2, OpenWebText, and WMT. Experimental results show that Entropy-UID achieves lower surprisal and entropy variance compared to standard GPT-2 and alternative heuristics, leading to more balanced and human-like text generation. Our findings point towards the potential of leveraging information-theoretic constraints to refine token selection strategies in autoregressive language models.

Paper Structure

This paper contains 16 sections, 3 equations, 1 figure, 1 table, 1 algorithm.

Figures (1)

  • Figure 1: Absolute Difference Between Avg Entropy and Avg Surprisal Across Models