Table of Contents
Fetching ...

Isotonic Mechanism for Exponential Family Estimation in Machine Learning Peer Review

Yuling Yan, Weijie J. Su, Jianqing Fan

TL;DR

This work extends the Isotonic Mechanism to exponential-family score models for peer review, showing that author-provided rankings can be truthfully elicited and used to adjust review scores without requiring knowledge of the underlying distribution. It proves incentive compatibility under convex utility, demonstrates that rankings are essentially the finest truthful information partition (with pairwise comparisons central to truthfulness in Gaussian settings), and establishes near minimax-optimal estimation of paper quality under bounded total variation. The mechanism consistently improves estimation accuracy over raw scores in both real ICML data and synthetic experiments, with substantial gains as the number of submissions grows. Overall, the approach offers a distribution-robust, information-efficient method to enhance conference peer review through truthful elicitation and isotonic estimation.

Abstract

In 2023, the International Conference on Machine Learning (ICML) required authors with multiple submissions to rank their submissions based on perceived quality. In this paper, we aim to employ these author-specified rankings to enhance peer review in machine learning and artificial intelligence conferences by extending the Isotonic Mechanism to exponential family distributions. This mechanism generates adjusted scores that closely align with the original scores while adhering to author-specified rankings. Despite its applicability to a broad spectrum of exponential family distributions, implementing this mechanism does not require knowledge of the specific distribution form. We demonstrate that an author is incentivized to provide accurate rankings when her utility takes the form of a convex additive function of the adjusted review scores. For a certain subclass of exponential family distributions, we prove that the author reports truthfully only if the question involves only pairwise comparisons between her submissions, thus indicating the optimality of ranking in truthful information elicitation. Moreover, we show that the adjusted scores improve dramatically the estimation accuracy compared to the original scores and achieve nearly minimax optimality when the ground-truth scores have bounded total variation. We conclude with a numerical analysis of the ICML 2023 ranking data, showing substantial estimation gains in approximating a proxy ground-truth quality of the papers using the Isotonic Mechanism.

Isotonic Mechanism for Exponential Family Estimation in Machine Learning Peer Review

TL;DR

This work extends the Isotonic Mechanism to exponential-family score models for peer review, showing that author-provided rankings can be truthfully elicited and used to adjust review scores without requiring knowledge of the underlying distribution. It proves incentive compatibility under convex utility, demonstrates that rankings are essentially the finest truthful information partition (with pairwise comparisons central to truthfulness in Gaussian settings), and establishes near minimax-optimal estimation of paper quality under bounded total variation. The mechanism consistently improves estimation accuracy over raw scores in both real ICML data and synthetic experiments, with substantial gains as the number of submissions grows. Overall, the approach offers a distribution-robust, information-efficient method to enhance conference peer review through truthful elicitation and isotonic estimation.

Abstract

In 2023, the International Conference on Machine Learning (ICML) required authors with multiple submissions to rank their submissions based on perceived quality. In this paper, we aim to employ these author-specified rankings to enhance peer review in machine learning and artificial intelligence conferences by extending the Isotonic Mechanism to exponential family distributions. This mechanism generates adjusted scores that closely align with the original scores while adhering to author-specified rankings. Despite its applicability to a broad spectrum of exponential family distributions, implementing this mechanism does not require knowledge of the specific distribution form. We demonstrate that an author is incentivized to provide accurate rankings when her utility takes the form of a convex additive function of the adjusted review scores. For a certain subclass of exponential family distributions, we prove that the author reports truthfully only if the question involves only pairwise comparisons between her submissions, thus indicating the optimality of ranking in truthful information elicitation. Moreover, we show that the adjusted scores improve dramatically the estimation accuracy compared to the original scores and achieve nearly minimax optimality when the ground-truth scores have bounded total variation. We conclude with a numerical analysis of the ICML 2023 ranking data, showing substantial estimation gains in approximating a proxy ground-truth quality of the papers using the Isotonic Mechanism.
Paper Structure (26 sections, 12 theorems, 131 equations, 4 figures, 2 tables)

This paper contains 26 sections, 12 theorems, 131 equations, 4 figures, 2 tables.

Key Result

Theorem 1

Under Assumptions ass:convex and ass:author2, the author maximizes her expected overall utility by truthfully reporting the ground-truth ranking $\pi^\ast$. That is, for any ranking $\pi$.

Figures (4)

  • Figure 1: Average review scores versus their standard deviations for all papers submitted to ICLR 2022 (https://github.com/fedebotu/ICLR2022-OpenReviewData). Each paper typically received three to five reviews, enabling the computation of its average score and standard deviation. Each point in this figure corresponds to a possible (average score, standard deviation) pair, where the size of a point reflects the number of papers with this pair of average review score and standard deviation.
  • Figure 2: Expected overall utility for all $n! = 4! = 24$ possible rankings in descending order, averaged over $100$ runs. The left panel uses review scores simulated from binomial distributions, while the right panel uses review scores simulated from Poisson distributions. In both panels, $(\mu_{1}^{\ast},\mu_{2}^{\ast},\mu_{3}^{\ast},\mu_{4}^{\ast})=(8,7,6,4)$, and the utility function $U(x) = \max\{x, 0\}^2$ (binomial counts are generated from $\mathrm{Binom}(10, \mu_{i}^{\ast}/10)$). The red dot represents the expected utility when the author reports truthfully. The second to fourth highest expected overall utilities are achieved by the rankings $(2,1,3,4)$, $(1,3,2,4)$, and $(1,2,4,3)$ for binomial distributions, and $(1,3,2,4)$, $(2,1,3,4)$, and $(1,2,4,3)$ for Poisson distributions, respectively. These results suggest that the mechanism may be stable against slightly misspecified rankings.
  • Figure 3: Estimation errors $\Vert\widehat{\bm{\mu}}-\bm{\mu}^{\ast}\Vert^{2}/n$ for the Isotonic Mechanism and $\Vert\bm{X}-\bm{\mu}^{\ast}\Vert^{2}/n$ for the original review scores, with varying numbers of submissions. The left panel presents results when the review scores follow binomial distributions, while the right panel exhibits results when the review scores follow Poisson distributions, in the same setting as the numerical experiment in Section \ref{['subsec:truthfulness']}. For any $n$, we set $\mu_{i}^{\ast}=9-6(i-1)/(n-1)$ for $i =1, \ldots, n$.
  • Figure 4: The difference of utility between reporting the correct and incorrect ranking, for two different types of utility functions. The difference is positive (resp. negative) means that the author has higher utility when reporting the correct (resp. incorrect) ranking, and is represented by the green (resp. red) color.

Theorems & Definitions (22)

  • Theorem 1
  • Theorem 2
  • Definition 1
  • Remark 1
  • Theorem 3
  • Proposition 1
  • Proposition 2
  • Remark 2
  • Proposition 3
  • Theorem 4
  • ...and 12 more