Table of Contents
Fetching ...

AlphaRank: An Artificial Intelligence Approach for Ranking and Selection Problems

Ruihan Zhou, L. Jeff Hong, Yijie Peng

TL;DR

AlphaRank recasts fixed-budget ranking and selection as a Markov decision process and introduces an offline-trained neural network value model guided by rollout-based learning to accelerate online sample allocation. It couples a Monte Carlo rollout policy with base R&S procedures to learn action values, while providing theoretical guarantees on policy improvement and consistency. To tackle large-scale problems, AlphaRank employs a divide-and-conquer (DCR) framework that partitions the problem and leverages small, pre-trained NNs, enabling scalable parallel computation. Empirical results show AlphaRank outperforms traditional procedures in both high- and low-confidence scenarios and across problem scales, with notable gains when using DCR for thousands to millions of alternatives. The work offers a practical, AI-powered pathway to fast, accurate R&S, with potential cloud-based pre-trained models for different priors and problem sizes.

Abstract

We introduce AlphaRank, an artificial intelligence approach to address the fixed-budget ranking and selection (R&S) problems. We formulate the sequential sampling decision as a Markov decision process and propose a Monte Carlo simulation-based rollout policy that utilizes classic R&S procedures as base policies for efficiently learning the value function of stochastic dynamic programming. We accelerate online sample-allocation by using deep reinforcement learning to pre-train a neural network model offline based on a given prior. We also propose a parallelizable computing framework for large-scale problems, effectively combining "divide and conquer" and "recursion" for enhanced scalability and efficiency. Numerical experiments demonstrate that the performance of AlphaRank is significantly improved over the base policies, which could be attributed to AlphaRank's superior capability on the trade-off among mean, variance, and induced correlation overlooked by many existing policies.

AlphaRank: An Artificial Intelligence Approach for Ranking and Selection Problems

TL;DR

AlphaRank recasts fixed-budget ranking and selection as a Markov decision process and introduces an offline-trained neural network value model guided by rollout-based learning to accelerate online sample allocation. It couples a Monte Carlo rollout policy with base R&S procedures to learn action values, while providing theoretical guarantees on policy improvement and consistency. To tackle large-scale problems, AlphaRank employs a divide-and-conquer (DCR) framework that partitions the problem and leverages small, pre-trained NNs, enabling scalable parallel computation. Empirical results show AlphaRank outperforms traditional procedures in both high- and low-confidence scenarios and across problem scales, with notable gains when using DCR for thousands to millions of alternatives. The work offers a practical, AI-powered pathway to fast, accurate R&S, with potential cloud-based pre-trained models for different priors and problem sizes.

Abstract

We introduce AlphaRank, an artificial intelligence approach to address the fixed-budget ranking and selection (R&S) problems. We formulate the sequential sampling decision as a Markov decision process and propose a Monte Carlo simulation-based rollout policy that utilizes classic R&S procedures as base policies for efficiently learning the value function of stochastic dynamic programming. We accelerate online sample-allocation by using deep reinforcement learning to pre-train a neural network model offline based on a given prior. We also propose a parallelizable computing framework for large-scale problems, effectively combining "divide and conquer" and "recursion" for enhanced scalability and efficiency. Numerical experiments demonstrate that the performance of AlphaRank is significantly improved over the base policies, which could be attributed to AlphaRank's superior capability on the trade-off among mean, variance, and induced correlation overlooked by many existing policies.
Paper Structure (38 sections, 10 theorems, 54 equations, 12 figures, 13 tables, 2 algorithms)

This paper contains 38 sections, 10 theorems, 54 equations, 12 figures, 13 tables, 2 algorithms.

Key Result

Proposition 1

The rollout action value of action $a_{t+1}^{(i)}$ in state $s_t$ is the theoretical PCS of the base policy $\pi$ in state $s_{t+1}^{(i)}$, i.e., $Q(s_t,a_{t+1}^{(i)})=\text{PCS}^{\pi}(s_{t+1}^{(i)})$. Furthermore, define ${\rm Pr}^{improve}\left(s_t\right)$ as the probability of policy improvement where $a^*=\arg\max_{i=1,\dots,N}Q(s_t,a_{t+1}^{(i)})$.

Figures (12)

  • Figure 1: The decisions in the rollout process.
  • Figure 1: Training procedure of value network in parallel.
  • Figure 2: NN training and evaluating architecture with the number of alternatives=2.
  • Figure 2: PCSs of EA, EI, PTV, and rollout policies in Experiment 2.
  • Figure 3: The training procedure of NN in parallel.
  • ...and 7 more figures

Theorems & Definitions (13)

  • Proposition 1
  • Remark 1
  • Proposition 2
  • Proposition 3
  • Remark 2
  • Proposition 4
  • Corollary 1
  • Remark 3
  • Theorem 1: Proposition 1.
  • Theorem 2: Proposition 2.
  • ...and 3 more