Table of Contents
Fetching ...

Experimental Assortments for Choice Estimation and Nest Identification

Xintong Yu, Will Ma, Michael Zhao

Abstract

What assortments (subsets of items) should be offered, to collect data for estimating a choice model over $n$ total items? We propose a structured, non-adaptive experiment design requiring only $O(\log n)$ distinct assortments, each offered repeatedly, that consistently outperforms randomized and other heuristic designs across an extensive numerical benchmark that estimates multiple different choice models under a variety of (possibly mis-specified) ground truths. We then focus on Nested Logit choice models, which cluster items into "nests" of close substitutes. Whereas existing Nested Logit estimation procedures assume the nests to be known and fixed, we present a new algorithm to identify nests based on collected data, which when used in conjunction with our experiment design, guarantees correct identification of nests under any Nested Logit ground truth. Our experiment design was deployed to collect data from over 70 million users at Dream11, an Indian fantasy sports platform that offers different types of betting contests, with rich substitution patterns between them. We identify nests based on the collected data, which lead to better out-of-sample choice prediction than ex-ante clustering from contest features. Our identified nests are ex-post justifiable to Dream11 management.

Experimental Assortments for Choice Estimation and Nest Identification

Abstract

What assortments (subsets of items) should be offered, to collect data for estimating a choice model over total items? We propose a structured, non-adaptive experiment design requiring only distinct assortments, each offered repeatedly, that consistently outperforms randomized and other heuristic designs across an extensive numerical benchmark that estimates multiple different choice models under a variety of (possibly mis-specified) ground truths. We then focus on Nested Logit choice models, which cluster items into "nests" of close substitutes. Whereas existing Nested Logit estimation procedures assume the nests to be known and fixed, we present a new algorithm to identify nests based on collected data, which when used in conjunction with our experiment design, guarantees correct identification of nests under any Nested Logit ground truth. Our experiment design was deployed to collect data from over 70 million users at Dream11, an Indian fantasy sports platform that offers different types of betting contests, with rich substitution patterns between them. We identify nests based on the collected data, which lead to better out-of-sample choice prediction than ex-ante clustering from contest features. Our identified nests are ex-post justifiable to Dream11 management.
Paper Structure (77 sections, 13 theorems, 85 equations, 14 figures, 6 tables, 1 algorithm)

This paper contains 77 sections, 13 theorems, 85 equations, 14 figures, 6 tables, 1 algorithm.

Key Result

proposition 1

Suppose ass:identify and ass:genPos hold and take any $S\in\mathcal{S}$. For all $i\in S$: Therefore, we can make the following deductions about nest membership.

Figures (14)

  • Figure 1: Example deductions after Experiment 1 in \ref{['table:introDesign']}. "Small" boost factors were observed for $\xspace$ and $\xspace$, while "large" and distinct boost factors were observed for $\xspace$ and $\xspace$. By deduction I., we know $\xspace$ and $\xspace$ are not in the same nest as any of the 6 other items, but we do not know whether $\xspace$ and $\xspace$ are in the same nest. By deduction II., we know $\xspace$ and $\xspace$ are also not in the same nest. The table does not record the latent information that both $\xspace$ and $\xspace$ are in the same nest as at least one item outside $S$, even though this information is necessary for our nest identification. The complete identification of nests is found in \ref{['sssec:algIllustration']}.
  • Figure 2: Evolution of the adjacency matrix $E$ during nest identification. White squares indicate $E[i,j]=1$ (same nest); black squares indicate $E[i,j]=0$ (different nests); while grey squares indicate $E[i,j]=\texttt{null}$ (not yet determined). The state of the adjacency matrix $E$ is displayed after processing each of the 6 experimental assortments $S\in\mathcal{S}$, and after the "One Hop Transitivity" (line \ref{['alg1:transitivity']}) and "Identify Missing Pairs" (line \ref{['alg1:uniqueNest']}) operations.
  • Figure 3: Comparing experiment designs in a mis-specified setting
  • Figure 4: Comparing experiment designs in a well-specified setting, where we display average $\mathrm{RMSE}^\mathrm{soft}$ over the 500 Markov Chain ground truths under Markov Chain choice estimation
  • Figure 5: Comparing experiment designs and nest identification algorithms, averaged over the 500 Nested Logit ground truths with an outside option
  • ...and 9 more figures

Theorems & Definitions (24)

  • definition 1: Nest Identification Problem, noiseless version
  • definition 2: Boost Factors
  • proposition 1: Nest Deductions with Outside Option
  • theorem 1: proven in Section \ref{['sec:mainResultPf']}
  • theorem 2: proven in \ref{['sec:sampleComplexityPf']}
  • proposition 2: Nest Deductions without Outside Option
  • theorem 3: proven in \ref{['sec:idenWithoutOutsidePf']}
  • lemma 1: Injectivity of the encoding
  • proof
  • theorem 4
  • ...and 14 more