Correlation and Navigation in the Vocabulary Key Representation Space of Language Models
Letian Peng, Chenyang An, Jingbo Shang
TL;DR
The paper reveals that neural language models' next-token predictions are biased by a fixed, context-agnostic vocabulary key space, causing spurious correlations that inflate middle-ranked candidates. By combining knowledge probing, clustering of token embeddings, and visualization, the authors show that tokens similar in the key space to top predictions can be mistaken as plausible continuations. They introduce In-context Navigation (ICN), an iterative prompting method that pushes the query away from explored keys, improving knowledge-probing precision, and extending to open-ended generation and chain-of-thought tasks with higher diversity and self-consistency. The work also documents training-time risks from fixed key spaces, showing that fine-tuning mostly adjusts the query encoder and can propagate in-cluster biases, motivating reranking or contextualized vocabularies as potential remedies. Overall, the findings offer a retrieval-inspired lens on NTP and practical strategies to enhance decoding diversity and reliability while highlighting gaps in current fine-tuning regimes.
Abstract
Language model (LM) decoding is based on the next-token prediction (NTP) probability distribution. For neural LMs (e.g., Transformer-based), NTP distribution is essentially a softmax-regularized dot product between an encoded input context (query) and fixed vocabulary representations (keys). In this paper, we study the effect of the key distribution on the NTP distribution, with a focus on whether the similarity between keys will trigger spurious correlations in NTP. Through knowledge-probing tasks, we show that in the NTP distribution, the few top-ranked tokens are typically accurate. However, the middle-ranked prediction is highly biased towards the tokens that are distributionally (not necessarily semantically) similar to these top ones. For instance, if "P" is predicted as the top-1 token, "A"-"Z" will all be ranked high in NTP, no matter whether they can lead to correct decoding results. This hurts the sampling diversity and makes the sampling of correct, long-tail results hopeless and noisy. We attempt to alleviate this issue via a novel in-context method that iteratively pushes the query representation away from explored regions. Specifically, we include the explored decoding results in the context and prompt the LM to generate something else, which encourages the LM to produce a query representation that has small dot products with explored keys. Experiments on knowledge-probing tasks show that our method leads to efficient navigation away from explored keys to correct new keys. We further extend our method to open-ended and chain-of-thought (for reasoning) generation. Experiment results show that ICN contributes to better generation diversity and improved self-consistency voting performance. Finally, we discuss potential training issues caused by the fixed key space together with the challenges and possible ways to address them in future research.
