A Frustratingly Simple Decoding Method for Neural Text Generation
Haoran Yang, Deng Cai, Huayang Li, Wei Bi, Wai Lam, Shuming Shi
TL;DR
Frustratingly Simple Decoding (FSD) introduces an on-the-fly anti-LM that penalizes repetitive content during neural text generation. By combining the standard LM score $p_{ heta}(v|x_{<t})$ with a penalty term $p_{oldsymbol{ extomega}}(v|x_{<t})$ via $ ext{FSD}(v|x_{<t}) = p_{ heta}(v|x_{<t}) - oldsymbol{ extalpha} imes p_{oldsymbol{ extomega}}(v|x_{<t})$, FSD can be instantiated with a discrete $n$-gram anti-LM or a vectorized variant that uses hidden states, enabling GPU acceleration. The method requires no extra model parameters and achieves near-greedy decoding speeds while improving generation quality, as shown by automatic and human evaluations across multiple datasets, languages, and tasks, including instruction following and summarization. Overall, FSD offers a universal, efficient decoding paradigm that mitigates degeneration in open-ended generation and demonstrates robust performance across LM families and domains.
Abstract
We introduce a frustratingly simple, super efficient and surprisingly effective decoding method, which we call Frustratingly Simple Decoding (FSD), for neural text generation. The idea behind FSD is straightforward: we build an anti-LM based on previously generated text and use this anti-LM to penalize future generation of what has been generated. The anti-LM can be implemented as simple as an n-gram language model or a vectorized variant. In this way, FSD introduces no extra model parameters and negligible computational overhead (FSD can be as fast as greedy search). Despite the simplicity, FSD is surprisingly effective; Experiments show that FSD can outperform the canonical methods to date (i.e., nucleus sampling) as well as several strong baselines that were proposed recently.
