Table of Contents
Fetching ...

Long-Term Value of Exploration: Measurements, Findings and Algorithms

Yi Su, Xiangyu Wang, Elaine Ya Le, Liang Liu, Yuening Li, Haokai Lu, Benjamin Lipshitz, Sriraj Badam, Lukasz Heldt, Shuchao Bi, Ed Chi, Cristos Goodrow, Su-Lin Wu, Lexi Baugher, Minmin Chen

TL;DR

This work tackles the challenge of quantifying the long-term value of exploration in production recommender systems by introducing the Discoverable Corpus metric and a user-corpus-codiverted A/B framework to link corpus growth with long-term user satisfaction. It then implements Neural Linear Bandits as a scalable exploration backbone within a large-scale, multi-stage ranking system and validates the approach through extensive live experiments. Key findings show that exploration enlarges the discoverable corpus, improves usefulness of tail and fresh content, and yields sustained gains in user satisfaction, with uncertainty estimates aligning with content and user characteristics. The study offers practical guidance for deploying exploration in industrial systems and points to future work on multi-task exploration and exploration-driven model learning.

Abstract

Effective exploration is believed to positively influence the long-term user experience on recommendation platforms. Determining its exact benefits, however, has been challenging. Regular A/B tests on exploration often measure neutral or even negative engagement metrics while failing to capture its long-term benefits. We here introduce new experiment designs to formally quantify the long-term value of exploration by examining its effects on content corpus, and connecting content corpus growth to the long-term user experience from real-world experiments. Once established the values of exploration, we investigate the Neural Linear Bandit algorithm as a general framework to introduce exploration into any deep learning based ranking systems. We conduct live experiments on one of the largest short-form video recommendation platforms that serves billions of users to validate the new experiment designs, quantify the long-term values of exploration, and to verify the effectiveness of the adopted neural linear bandit algorithm for exploration.

Long-Term Value of Exploration: Measurements, Findings and Algorithms

TL;DR

This work tackles the challenge of quantifying the long-term value of exploration in production recommender systems by introducing the Discoverable Corpus metric and a user-corpus-codiverted A/B framework to link corpus growth with long-term user satisfaction. It then implements Neural Linear Bandits as a scalable exploration backbone within a large-scale, multi-stage ranking system and validates the approach through extensive live experiments. Key findings show that exploration enlarges the discoverable corpus, improves usefulness of tail and fresh content, and yields sustained gains in user satisfaction, with uncertainty estimates aligning with content and user characteristics. The study offers practical guidance for deploying exploration in industrial systems and points to future work on multi-task exploration and exploration-driven model learning.

Abstract

Effective exploration is believed to positively influence the long-term user experience on recommendation platforms. Determining its exact benefits, however, has been challenging. Regular A/B tests on exploration often measure neutral or even negative engagement metrics while failing to capture its long-term benefits. We here introduce new experiment designs to formally quantify the long-term value of exploration by examining its effects on content corpus, and connecting content corpus growth to the long-term user experience from real-world experiments. Once established the values of exploration, we investigate the Neural Linear Bandit algorithm as a general framework to introduce exploration into any deep learning based ranking systems. We conduct live experiments on one of the largest short-form video recommendation platforms that serves billions of users to validate the new experiment designs, quantify the long-term values of exploration, and to verify the effectiveness of the adopted neural linear bandit algorithm for exploration.
Paper Structure (30 sections, 10 equations, 10 figures, 3 tables, 2 algorithms)

This paper contains 30 sections, 10 equations, 10 figures, 3 tables, 2 algorithms.

Figures (10)

  • Figure 1: The multi-stage recommender system, with candidate generation in the first stage followed by pointwise ranking and setwise packing.
  • Figure 2: User-Corpus-CoDiverted experiment diagram
  • Figure 3: Discoverable Corpus $@100, 7$-day period (left) and Discoverable Corpus $@1000, 7$-day period (right) for both control and treatment arms.
  • Figure 4: The histogram of the Discoverable Corpus $@X, 3$-month period (number of contents receiving at least $X$ positive feedback) for various $X$ values, between control and treatment arms. The $x$-axis shows the value of $X$ (in log scale), and the y-axis denotes the Discoverable Corpus $@X, 3$-month period.
  • Figure 5: Left: the percentage change of the number of satisfied daily active users across time, for different ablation size $x\%$. Right: Linear interpolation of the change in terms of number of satisfied daily active users w.r.t. discoverable corpus size change.
  • ...and 5 more figures