Similarity-Distance-Magnitude Language Models
Allen Schmaltz
TL;DR
This work introduces Similarity-Distance-Magnitude ($\nSDM$) language models, which fine-tune decoder-only LMs to maximize the fraction of generations that fall inside a well-calibrated, high-probability region defined by a final-layer $\nSDM$ activation. The approach combines a document-level binary classifier at training time with a contrastive encoding scheme ($\encodingContrastiveMasking$) and online generation of hard negatives to shape the next-token loss, thereby reducing abstentions and improving statistical efficiency. Key innovations include (i) a final-layer $\nSDM$ activation used for both test-time classification and training-time loss weighting, (ii) a contrastive masking encoding scheme, and (iii) online hard-negative generation to expose the model to challenging, instruction-violating completions. Experiments on a word-ordering task with a 3.8B decoder show that SDM fine-tuning significantly increases the proportion of in-distribution generations within the high-probability region, with robust behavior under distributional shifts and only modest changes to marginal accuracy, demonstrating practical gains for uncertainty-aware selective generation in large LMs.
Abstract
We introduce Similarity-Distance-Magnitude (SDM) language models (LMs), which are sequence prediction models fine-tuned to maximize the proportion of generations in the well-calibrated, high-probability region partitioned by a final-layer SDM activation layer used for binary classification of instruction-following. We demonstrate that existing pre-trained decoder-only Transformer LMs can be readily converted into SDM LMs via supervised fine-tuning, using the final-layer SDM activation layer during training to estimate a change-of-base for a supervised next-token loss over a contrastive input encoding scheme, with additional hard negative examples generated online during training. This results in reduced abstentions (i.e., improved statistical efficiency) compared to strong supervised baselines.
