Unsupervised Contrast-Consistent Ranking with Language Models
Niklas Stoehr, Pengxiang Cheng, Jing Wang, Daniel Preotiuc-Pietro, Rajarshi Bhowmik
TL;DR
The paper tackles the problem of reliably extracting ranking knowledge from language models without supervision, showing that prompting alone can yield inconsistent rankings. It extends the unsupervised Contrast-Consistent Search (CCS) framework to Contrast-Consistent Ranking (CCR), proposing Pairwise CCR, Pointwise CCR, and Listwise CCR with corresponding loss formulations (e.g., MarginCCR, TripletCCR, OrdRegCCR). Across multiple encoder/decoder models and six ranking datasets, CCR probing often outperforms prompting for smaller models and matches prompting performance for larger models, while offering greater control and interpretability. The work demonstrates that unsupervised probing can yield robust, direction-invariant rankings and provides a foundation for more reliable in-context ranking applications in NLP systems.
Abstract
Language models contain ranking-based knowledge and are powerful solvers of in-context ranking tasks. For instance, they may have parametric knowledge about the ordering of countries by size or may be able to rank product reviews by sentiment. We compare pairwise, pointwise and listwise prompting techniques to elicit a language model's ranking knowledge. However, we find that even with careful calibration and constrained decoding, prompting-based techniques may not always be self-consistent in the rankings they produce. This motivates us to explore an alternative approach that is inspired by an unsupervised probing method called Contrast-Consistent Search (CCS). The idea is to train a probe guided by a logical constraint: a language model's representation of a statement and its negation must be mapped to contrastive true-false poles consistently across multiple statements. We hypothesize that similar constraints apply to ranking tasks where all items are related via consistent, pairwise or listwise comparisons. To this end, we extend the binary CCS method to Contrast-Consistent Ranking (CCR) by adapting existing ranking methods such as the Max-Margin Loss, Triplet Loss and an Ordinal Regression objective. Across different models and datasets, our results confirm that CCR probing performs better or, at least, on a par with prompting.
