Table of Contents
Fetching ...

From Noise to Order: Learning to Rank via Denoising Diffusion

Sajad Ebrahimi, Bhaskar Mitra, Negar Arabzadeh, Ye Yuan, Haolun Wu, Fattane Zarrinkalam, Ebrahim Bagheri

TL;DR

This work introduces DiffusionRank, a diffusion-based generative approach to learning-to-rank that models the joint distribution over ranking features and relevance labels, extending TabDiff to IR datasets. By applying Gaussian diffusion to numerical features and masked diffusion to categorical relevance labels, DiffusionRank trains a denoising model that can serve as a generative counterpart to pointwise and pairwise discriminative objectives, with competitive or superior ranking performance. Empirical results on LETOR 4.0 and MSLR-WEB10K show that DiffusionRank yields robust improvements, especially with larger training sets, and remains effective under feature perturbations, suggesting that modeling the full data distribution enhances generalization. The work highlights a promising direction for leveraging deep generative models in IR, with potential extensions to listwise objectives, larger data regimes, and representation-learning IR scenarios.

Abstract

In information retrieval (IR), learning-to-rank (LTR) methods have traditionally limited themselves to discriminative machine learning approaches that model the probability of the document being relevant to the query given some feature representation of the query-document pair. In this work, we propose an alternative denoising diffusion-based deep generative approach to LTR that instead models the full joint distribution over feature vectors and relevance labels. While in the discriminative setting, an over-parameterized ranking model may find different ways to fit the training data, we hypothesize that candidate solutions that can explain the full data distribution under the generative setting produce more robust ranking models. With this motivation, we propose DiffusionRank that extends TabDiff, an existing denoising diffusion-based generative model for tabular datasets, to create generative equivalents of classical discriminative pointwise and pairwise LTR objectives. Our empirical results demonstrate significant improvements from DiffusionRank models over their discriminative counterparts. Our work points to a rich space for future research exploration on how we can leverage ongoing advancements in deep generative modeling approaches, such as diffusion, for learning-to-rank in IR.

From Noise to Order: Learning to Rank via Denoising Diffusion

TL;DR

This work introduces DiffusionRank, a diffusion-based generative approach to learning-to-rank that models the joint distribution over ranking features and relevance labels, extending TabDiff to IR datasets. By applying Gaussian diffusion to numerical features and masked diffusion to categorical relevance labels, DiffusionRank trains a denoising model that can serve as a generative counterpart to pointwise and pairwise discriminative objectives, with competitive or superior ranking performance. Empirical results on LETOR 4.0 and MSLR-WEB10K show that DiffusionRank yields robust improvements, especially with larger training sets, and remains effective under feature perturbations, suggesting that modeling the full data distribution enhances generalization. The work highlights a promising direction for leveraging deep generative models in IR, with potential extensions to listwise objectives, larger data regimes, and representation-learning IR scenarios.

Abstract

In information retrieval (IR), learning-to-rank (LTR) methods have traditionally limited themselves to discriminative machine learning approaches that model the probability of the document being relevant to the query given some feature representation of the query-document pair. In this work, we propose an alternative denoising diffusion-based deep generative approach to LTR that instead models the full joint distribution over feature vectors and relevance labels. While in the discriminative setting, an over-parameterized ranking model may find different ways to fit the training data, we hypothesize that candidate solutions that can explain the full data distribution under the generative setting produce more robust ranking models. With this motivation, we propose DiffusionRank that extends TabDiff, an existing denoising diffusion-based generative model for tabular datasets, to create generative equivalents of classical discriminative pointwise and pairwise LTR objectives. Our empirical results demonstrate significant improvements from DiffusionRank models over their discriminative counterparts. Our work points to a rich space for future research exploration on how we can leverage ongoing advancements in deep generative modeling approaches, such as diffusion, for learning-to-rank in IR.
Paper Structure (28 sections, 18 equations, 4 figures, 3 tables)

This paper contains 28 sections, 18 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Overview of DiffusionRank (pointwise). We model learning-to-rank as denoising diffusion process over tabular feature-label tuples: a forward process progressively corrupts the input (Gaussian noise for numerical features and masking/noising for categorical variables, including relevance labels), and a learned reverse process denoises to recover clean samples.
  • Figure 2: Architectural differences between discriminative and DiffusionRank models. (a) Standard discriminative LTR model, which takes only query-document features as input and predicts a relevance score or label. (b) Pointwise DiffusionRank, where the model additionally conditions on the (possibly masked) relevance label and diffusion time step, and jointly predicts the relevance label and the noise added to features. (c) Pairwise DiffusionRank, which applies the pointwise denoising model independently to a document pair with tied label masking, producing scores for both documents while learning from noisy feature representations.
  • Figure 3: Training dynamics on validation data (NDCG@10) for discriminative vs. DiffusionRank models. We plot validation effectiveness over training steps for (left) MSLR-WEB10K and (right) MQ2007. Across both datasets, DiffusionRank shows smoother, more stable trajectories and less pronounced degradation at later training stages, consistent with improved robustness to overfitting compared to discriminative baselines.
  • Figure 4: Effect of training set size on ranking effectiveness in terms of NDCG@10. Colors indicate datasets; solid lines denote DiffusionRank and dashed lines denote the corresponding discriminative baseline.