Table of Contents
Fetching ...

Synthesizing Scoring Functions for Rankings Using Symbolic Gradient Descent

Zixuan Chen, Panagiotis Manolios, Mirek Riedewald

TL;DR

RankHow tackles the problem of reconstructing a given ranking with a simple, interpretable linear scoring function by formalizing OPT as a constrained weight-vector optimization problem. It delivers an exact MILP-based solution and analyzes a PTIME LP-reduction, finding the MILP approach empirically faster due to solver heuristics; to scale further, it introduces Symbolic Gradient Descent (Sym-GD) that localizes optimization within weight-space cells. The approach is validated on real datasets (NBA MVP, CSRankings) and large synthetic data, showing it achieves lower ranking errors than competitive methods and scales to large inputs, while carefully addressing numerical imprecision.Collectively, RankHow provides a principled, scalable framework for ranking function synthesis with actionable constraints and improved explainability for ranking decisions.

Abstract

Given a relation and a ranking of its tuples, but no information about the ranking function, we are interested in synthesizing simple scoring functions that reproduce the ranking. Our system RankHow identifies linear scoring functions that minimize position-based error, while supporting flexible constraints on their weights. It is based on a new formulation as a mixed-integer linear program (MILP). While MILP is NP-hard in general, we show that RankHow is orders of magnitude faster than a tree-based algorithm that guarantees polynomial time complexity (PTIME) in the number of input tuples by reducing the MILP problem to many linear programs (LPs). We hypothesize that this is caused by 2 properties: First, the PTIME algorithm is equivalent to a naive evaluation strategy for the MILP program. Second, MILP solvers rely on advanced heuristics to reason holistically about the entire program, while the PTIME algorithm solves many sub-problems in isolation. To further improve RankHow's scalability, we propose a novel approximation technique called symbolic gradient descent (Sym-GD). It exploits problem structure to more quickly find local minima of the error function. Experiments demonstrate that RankHow can solve realistic problems, finding more accurate linear scoring functions than the state of the art.

Synthesizing Scoring Functions for Rankings Using Symbolic Gradient Descent

TL;DR

RankHow tackles the problem of reconstructing a given ranking with a simple, interpretable linear scoring function by formalizing OPT as a constrained weight-vector optimization problem. It delivers an exact MILP-based solution and analyzes a PTIME LP-reduction, finding the MILP approach empirically faster due to solver heuristics; to scale further, it introduces Symbolic Gradient Descent (Sym-GD) that localizes optimization within weight-space cells. The approach is validated on real datasets (NBA MVP, CSRankings) and large synthetic data, showing it achieves lower ranking errors than competitive methods and scales to large inputs, while carefully addressing numerical imprecision.Collectively, RankHow provides a principled, scalable framework for ranking function synthesis with actionable constraints and improved explainability for ranking decisions.

Abstract

Given a relation and a ranking of its tuples, but no information about the ranking function, we are interested in synthesizing simple scoring functions that reproduce the ranking. Our system RankHow identifies linear scoring functions that minimize position-based error, while supporting flexible constraints on their weights. It is based on a new formulation as a mixed-integer linear program (MILP). While MILP is NP-hard in general, we show that RankHow is orders of magnitude faster than a tree-based algorithm that guarantees polynomial time complexity (PTIME) in the number of input tuples by reducing the MILP problem to many linear programs (LPs). We hypothesize that this is caused by 2 properties: First, the PTIME algorithm is equivalent to a naive evaluation strategy for the MILP program. Second, MILP solvers rely on advanced heuristics to reason holistically about the entire program, while the PTIME algorithm solves many sub-problems in isolation. To further improve RankHow's scalability, we propose a novel approximation technique called symbolic gradient descent (Sym-GD). It exploits problem structure to more quickly find local minima of the error function. Experiments demonstrate that RankHow can solve realistic problems, finding more accurate linear scoring functions than the state of the art.
Paper Structure (21 sections, 4 theorems, 9 equations, 3 figures, 3 tables, 2 algorithms)

This paper contains 21 sections, 4 theorems, 9 equations, 3 figures, 3 tables, 2 algorithms.

Key Result

Lemma 1

eq:best_LP_topk_constraints_num is logically equivalent to the indicator constraints in eq:best_LP_topk

Figures (3)

  • Figure 1: Weights resulting in ties (oblique lines) vs no ties (all others), and an example scoring function (star).
  • Figure 2: \ref{['example:space']}: Solution space (2D triangle in 3D space) and indicator boundaries (2 lines in the triangle). The colored numbers show the indicator's value when selecting the weight from the corresponding side of the line.
  • Figure 3: Performance on OPT

Theorems & Definitions (17)

  • Example 1: NBA MVP selection
  • Example 2: Prediction accuracy vs ranking accuracy
  • Example 3: Predicting scores vs predicting ordering
  • Definition 1: given ranking
  • Definition 2: score-based ranking
  • Definition 3: position-based error
  • Definition 4: optimization problem OPT
  • Example 4
  • Lemma 1
  • proof
  • ...and 7 more