Learning to Select and Rank from Choice-Based Feedback: A Simple Nested Approach
Junwen Yang, Yifan Feng
TL;DR
This work studies ranking and selection from choice-based feedback under dynamic assortments, where an unknown strict ranking governs observed choices. It introduces two simple, scalable algorithms: Nested Elimination (NE) for best-item identification and Nested Partition (NP) for full-ranking identification, both analyzed via a multi-dimensional random-walk framework and linked to information-theoretic lower bounds. NE is shown to be worst-case asymptotically optimal, while NP attains near-optimal results up to a constant factor, with guarantees that hold in a non-asymptotic, instance-specific form. Empirical results on synthetic and real data corroborate the theoretical insights, demonstrating substantial improvements in sample efficiency and computational speed over prior methods and illustrating the practical value of nested, SPRT-inspired learning strategies for online preference learning with choice-based feedback.
Abstract
We study a ranking and selection problem of learning from choice-based feedback with dynamic assortments. In this problem, a company sequentially displays a set of items to a population of customers and collects their choices as feedback. The only information available about the underlying choice model is that the choice probabilities are consistent with some unknown true strict ranking over the items. The objective is to identify, with the fewest samples, the most preferred item or the full ranking over the items at a high confidence level. We present novel and simple algorithms for both learning goals. In the first subproblem regarding best-item identification, we introduce an elimination-based algorithm, Nested Elimination (NE). In the more complex subproblem regarding full-ranking identification, we generalize NE and propose a divide-and-conquer algorithm, Nested Partition (NP). We provide strong characterizations of both algorithms through instance-specific and non-asymptotic bounds on the sample complexity. This is accomplished using an analytical framework that characterizes the system dynamics through analyzing a sequence of multi-dimensional random walks. We also establish a connection between our nested approach and the information-theoretic lower bounds. We thus show that NE is worst-case asymptotically optimal, and NP is optimal up to a constant factor. Finally, numerical experiments from both synthetic and real data corroborate our theoretical findings.
