Table of Contents
Fetching ...

Position bias in features

Richard Demsyn-Jones

TL;DR

This work tackles position bias in search ranking by evaluating document-level relevance features derived from historical clicks. It extends inverse propensity scoring to create IPW-CTR, and compares against biased CTR, COEC, SNIPS, and IPW-COEC, highlighting variance-bias trade-offs. Key findings show IPW-CTR can approach true relevance with sufficient data but suffers high variance under strong bias, while IPW-COEC often reduces variance and can outperform biased signals; empirical weights are consistently poor. The paper advocates a pluralistic, bias-aware feature strategy that separates bias estimation from feature construction and provides practical guidance for robust learning-to-rank systems, including the use of synthetic datasets to demonstrate the dynamics.

Abstract

The purpose of modeling document relevance for search engines is to rank better in subsequent searches. Document-specific historical click-through rates can be important features in a dynamic ranking system which updates as we accumulate more sample. This paper describes the properties of several such features, and tests them in controlled experiments. Extending the inverse propensity weighting method to documents creates an unbiased estimate of document relevance. This feature can approximate relevance accurately, leading to near-optimal ranking in ideal circumstances. However, it has high variance that is increasing with respect to the degree of position bias. Furthermore, inaccurate position bias estimation leads to poor performance. Under several scenarios this feature can perform worse than biased click-through rates. This paper underscores the need for accurate position bias estimation, and is unique in suggesting simultaneous use of biased and unbiased position bias features.

Position bias in features

TL;DR

This work tackles position bias in search ranking by evaluating document-level relevance features derived from historical clicks. It extends inverse propensity scoring to create IPW-CTR, and compares against biased CTR, COEC, SNIPS, and IPW-COEC, highlighting variance-bias trade-offs. Key findings show IPW-CTR can approach true relevance with sufficient data but suffers high variance under strong bias, while IPW-COEC often reduces variance and can outperform biased signals; empirical weights are consistently poor. The paper advocates a pluralistic, bias-aware feature strategy that separates bias estimation from feature construction and provides practical guidance for robust learning-to-rank systems, including the use of synthetic datasets to demonstrate the dynamics.

Abstract

The purpose of modeling document relevance for search engines is to rank better in subsequent searches. Document-specific historical click-through rates can be important features in a dynamic ranking system which updates as we accumulate more sample. This paper describes the properties of several such features, and tests them in controlled experiments. Extending the inverse propensity weighting method to documents creates an unbiased estimate of document relevance. This feature can approximate relevance accurately, leading to near-optimal ranking in ideal circumstances. However, it has high variance that is increasing with respect to the degree of position bias. Furthermore, inaccurate position bias estimation leads to poor performance. Under several scenarios this feature can perform worse than biased click-through rates. This paper underscores the need for accurate position bias estimation, and is unique in suggesting simultaneous use of biased and unbiased position bias features.
Paper Structure (26 sections, 19 equations, 11 figures, 1 table, 1 algorithm)

This paper contains 26 sections, 19 equations, 11 figures, 1 table, 1 algorithm.

Figures (11)

  • Figure 1: IPW-CTR is an unbiased estimator of relevance, and with enough sample can perform as well as relevance itself.
  • Figure 2: IPW-CTR's high variance hurts ranking performance at low sample sizes.
  • Figure 3: IPW-CTR's performance is variable at high degrees of position bias.
  • Figure 4: Empirical weights perform poorly.
  • Figure 5: CTR or COEC weighting perform similarly.
  • ...and 6 more figures