Learning to Select and Rank from Choice-Based Feedback: A Simple Nested Approach

Junwen Yang; Yifan Feng

Learning to Select and Rank from Choice-Based Feedback: A Simple Nested Approach

Junwen Yang, Yifan Feng

TL;DR

This work studies ranking and selection from choice-based feedback under dynamic assortments, where an unknown strict ranking governs observed choices. It introduces two simple, scalable algorithms: Nested Elimination (NE) for best-item identification and Nested Partition (NP) for full-ranking identification, both analyzed via a multi-dimensional random-walk framework and linked to information-theoretic lower bounds. NE is shown to be worst-case asymptotically optimal, while NP attains near-optimal results up to a constant factor, with guarantees that hold in a non-asymptotic, instance-specific form. Empirical results on synthetic and real data corroborate the theoretical insights, demonstrating substantial improvements in sample efficiency and computational speed over prior methods and illustrating the practical value of nested, SPRT-inspired learning strategies for online preference learning with choice-based feedback.

Abstract

We study a ranking and selection problem of learning from choice-based feedback with dynamic assortments. In this problem, a company sequentially displays a set of items to a population of customers and collects their choices as feedback. The only information available about the underlying choice model is that the choice probabilities are consistent with some unknown true strict ranking over the items. The objective is to identify, with the fewest samples, the most preferred item or the full ranking over the items at a high confidence level. We present novel and simple algorithms for both learning goals. In the first subproblem regarding best-item identification, we introduce an elimination-based algorithm, Nested Elimination (NE). In the more complex subproblem regarding full-ranking identification, we generalize NE and propose a divide-and-conquer algorithm, Nested Partition (NP). We provide strong characterizations of both algorithms through instance-specific and non-asymptotic bounds on the sample complexity. This is accomplished using an analytical framework that characterizes the system dynamics through analyzing a sequence of multi-dimensional random walks. We also establish a connection between our nested approach and the information-theoretic lower bounds. We thus show that NE is worst-case asymptotically optimal, and NP is optimal up to a constant factor. Finally, numerical experiments from both synthetic and real data corroborate our theoretical findings.

Learning to Select and Rank from Choice-Based Feedback: A Simple Nested Approach

TL;DR

Abstract

Paper Structure (43 sections, 24 theorems, 229 equations, 13 figures, 1 table, 5 algorithms)

This paper contains 43 sections, 24 theorems, 229 equations, 13 figures, 1 table, 5 algorithms.

Introduction
Summary of Contributions
Literature Review
Problem Setup and Preliminaries
The Learning-to-Select (Best-Item Identification) Problem
The Nested Elimination Algorithm
Theoretical Analysis of NE
Discussion: Comparisons with Previous Work
Discussion: Key Technical Insights
The Learning-to-Rank (Full-Ranking Identification) Problem
From Best-Item to Full-Ranking Identification
The Nested Partition Algorithm
Theoretical Analysis of NP
Numerical Experiments
The Best-Item Identification Problem
...and 28 more sections

Key Result

Theorem 1

For every confidence level $\delta\in(0,1)$, NE is $\delta$-PAC with the parameter value Furthermore, for every preference instance $f\in\mathcal{M}_p$, there is a constant $C_f$ independent of $\delta$ such that

Figures (13)

Figure 1: A possible trajectory of $S_{\mathrm{active}}$ under NE. The item with the lowest vote is eliminated according to the criterion \ref{['eq:stopping rule']}.
Figure 2: A visualization of the system dynamics under NE. Let $K = 3$. In the first stage, the active set is $[3] = \{1, 2, 3\}$. The system dynamics are visualized by projecting the state variables $\{W(i)\}$ onto the two-dimensional space spanned by $(W(1) - W(3),\, W(2) - W(3))$. This projection results in a random walk that begins at the origin and evolves according to an i.i.d. sequence with possible increments of $(0, 1)$, $(1, 0)$, and $(-1, -1)$, occurring with probabilities $f(1 | [3])$, $f(2 | [3])$, and $f(3 | [3])$, respectively. The first stage finishes when the random walk reaches the boundary of a triangle defined by vertices $(0, M)$, $(M, 0)$, and $(-M, -M)$. Each face of the triangle corresponds to the elimination of one item. In the illustrated path, item 3 is eliminated, and the active set updates to $[2] = \{1, 2\}$, which is an event with high probability under any OA preference instance. In the second stage, the state variables are further projected into the one-dimensional space spanned by $W(1) - W(2)$. That results in a one-dimensional random walk, starting from the endpoint inherited from the first stage. It evolves by increments of $+1$ or $-1$ with probabilities $f(1 | [2])$ and $f(2 | [2])$, respectively. The second stage ends when the random walk reaches the endpoints $M$ or $-M$, which corresponds to the selecting item 1 or item 2, respectively.
Figure 3: A conceptual illustration of the theoretical contributions of NE. The horizontal axis represents different preference instances $f$, while the vertical axis represents the asymptotic expected sample complexity.
Figure 4: A possible trajectory of $S_{\mathrm{active}}$ under NP represented by a binary tree. A partition separates the highest-voted items from the lowest-voted ones according to \ref{['eq:partition rule']}.
Figure 5: A visualization of the system dynamics under NP. Let $K = 3$. In the initial stage, the active set is $[3] = \{1, 2, 3\}$. The projected state variables $(W(1) - W(3),\, W(2) - W(3))$ and the random walk dynamics are the same as NE illustrated in Figure \ref{['fig:process_NE']}. What differentiates NE and NP are the hitting boundaries. Under NP, the first stage finishes when the random walk hits the boundary of the concave polygon defined by vertices $(M, 0)$, $(2M, M)$, $(M,M)$, $(M, 2M)$, $(0,M)$, $(-M, M)$, $(-M, 0)$, $(-2M, -M)$, $(-M, -M)$, $(-M, -2M)$, $(0, -M)$, and $(M, -M)$. The 12 faces are further divided into 6 different partition possibilities. For example, $\{1,2\}|\{3\}$ means $S_{\mathrm{high}} = \{1,2\}$ and $S_{\mathrm{low}} = \{3\}$. In the illustrated path, the first stage finishes with $S_{\mathrm{high}} = \{1\}$ and $S_{\mathrm{low}} = \{2,3\}$, which is an event with high probability under any OA preference instance. Since $S_{\mathrm{high}}$ is a singleton, it suffices to look at $\{2,3\}$ as the active set in the next stage. Here, the state variables are further projected into the one-dimensional space spanned by $W(2) - W(3)$. That results in a one-dimensional random walk, which is the same as that under NE. Depending on which endpoint the random walk hits, the resulting ranking is either $(1,2,3)$ or $(1,3,2)$.
...and 8 more figures

Theorems & Definitions (36)

Definition 1: $p$-Separable family
Remark 1
Remark 2
Definition 2: $\delta$-PAC policy
Remark 3
Theorem 1: Sample complexity of NE in the fixed-confidence setting
Remark 4
Proposition 1: Minimal value of $I^{\mathrm N}$
Proposition 2
Proposition 3
...and 26 more

Learning to Select and Rank from Choice-Based Feedback: A Simple Nested Approach

TL;DR

Abstract

Learning to Select and Rank from Choice-Based Feedback: A Simple Nested Approach

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (36)