Table of Contents
Fetching ...

Policy Design for Two-sided Platforms with Participation Dynamics

Haruka Kiyohara, Fan Yao, Sarah Dean

TL;DR

This paper studies how viewer and provider populations co-evolve on two-sided platforms under population effects and shows that standard myopic recommendation policies can degrade long-term welfare. It introduces a dynamic, game-theoretic model of viewer-provider interactions, proves stability of the evolving system to a Nash equilibrium under mild conditions, and decomposes welfare regret into population and policy components. To optimize long-term social welfare, the authors propose a look-ahead policy that forecasts future populations and balances it with short-term goals via interpolation with a myopic policy, plus a practical estimation procedure using Explore-then-Commit. Through synthetic and real-data experiments (KuaiRec), the approach demonstrates improved welfare and balanced exposure across provider subgroups, underscoring the importance of exposure fairness and population-aware planning for platform health in the presence of growth dynamics.

Abstract

In two-sided platforms (e.g., video streaming or e-commerce), viewers and providers engage in interactive dynamics: viewers benefit from increases in provider populations, while providers benefit from increases in viewer population. Despite the importance of such "population effects" on long-term platform health, recommendation policies do not generally take the participation dynamics into account. This paper thus studies the dynamics and recommender policy design on two-sided platforms under the population effects for the first time. Our control- and game-theoretic findings warn against the use of the standard "myopic-greedy" policy and shed light on the importance of provider-side considerations (i.e., effectively distributing exposure among provider groups) to improve social welfare via population growth. We also present a simple algorithm to optimize long-term social welfare by taking the population effects into account, and demonstrate its effectiveness in synthetic and real-data experiments. Our experiment code is available at https://github.com/sdean-group/dynamics-two-sided-market.

Policy Design for Two-sided Platforms with Participation Dynamics

TL;DR

This paper studies how viewer and provider populations co-evolve on two-sided platforms under population effects and shows that standard myopic recommendation policies can degrade long-term welfare. It introduces a dynamic, game-theoretic model of viewer-provider interactions, proves stability of the evolving system to a Nash equilibrium under mild conditions, and decomposes welfare regret into population and policy components. To optimize long-term social welfare, the authors propose a look-ahead policy that forecasts future populations and balances it with short-term goals via interpolation with a myopic policy, plus a practical estimation procedure using Explore-then-Commit. Through synthetic and real-data experiments (KuaiRec), the approach demonstrates improved welfare and balanced exposure across provider subgroups, underscoring the importance of exposure fairness and population-aware planning for platform health in the presence of growth dynamics.

Abstract

In two-sided platforms (e.g., video streaming or e-commerce), viewers and providers engage in interactive dynamics: viewers benefit from increases in provider populations, while providers benefit from increases in viewer population. Despite the importance of such "population effects" on long-term platform health, recommendation policies do not generally take the participation dynamics into account. This paper thus studies the dynamics and recommender policy design on two-sided platforms under the population effects for the first time. Our control- and game-theoretic findings warn against the use of the standard "myopic-greedy" policy and shed light on the importance of provider-side considerations (i.e., effectively distributing exposure among provider groups) to improve social welfare via population growth. We also present a simple algorithm to optimize long-term social welfare by taking the population effects into account, and demonstrate its effectiveness in synthetic and real-data experiments. Our experiment code is available at https://github.com/sdean-group/dynamics-two-sided-market.

Paper Structure

This paper contains 22 sections, 6 theorems, 56 equations, 8 figures.

Key Result

Theorem 1

For any continuous functions $f, \bar{\lambda}$ with bounded first-order derivatives, consider the environment defined by the game instance $\mathcal{G}(\bm{\pi}, B, f, \bar{\lambda})$. We have:

Figures (8)

  • Figure 1: Comparing the myopic-greedy policy, the uniform random policy, and the long-term policy in a synthetic simulation. As shown, the myopic-greedy policy loses the provider population due to concentrated exposure allocation, resulting in the negative impact on the viewer welfare in the long-run. The "long-term" policy is based on the algorithm proposed in Section \ref{['sec:proposal']} (Eq. \ref{['eq:look_ahead_policy']}), and the experiment setting follows Section \ref{['sec:synthetic_experiment']} (with a small initial population).
  • Figure 2: Comparing the total welfare, and the viewer and provider populations with varying values of interpolation hyperparam, i.e., $\beta$. (Top) small initial population and (Bottom) large initial population. "uniform" represents the uniform random policy.
  • Figure 3: Comparing the utility matrix of the myopic ($\beta=0.0$), long-term ($\beta=1.0$), and uniform random policies at the final timestep and the initial utility matrix. For the initial utility matrix, we use the one with a small initial population.
  • Figure 4: Visualization of the (true) population effects in the real-world experiment. The population effects are based on the spline function reinsch1967smoothing fitted on the empirical population effect (dotted points) observed in the KuaiRec gao2022kuairec dataset. Figures \ref{['fig:estimation_population_effect']} and \ref{['fig:estimation_population_dynamics']} in the Appendix also report the population effects and dynamics learned by the long-term policy, following Section \ref{['sec:dyanmics_estimation']}.
  • Figure 5: Comparing the total welfare, viewer and provider populations, and regrets in the real-data experiment. Cumulative regret is the sum of total regret by the timestep $t$, and the total regret is decomposed into the population and policy regrets. Note that the true optimal policies that minimize the total regret and population regret are not accessible. Thus, we report the empirical regrets by letting one of the compared policies as the optimal baseline.
  • ...and 3 more figures

Theorems & Definitions (16)

  • Example 1: Video recommendation
  • Example 2: Job matching
  • Theorem 1
  • Proposition 1: Sufficient condition for stability
  • Theorem 2: Regret decomposition
  • Theorem 3: Optimality of the myopic-greedy
  • Proposition 2
  • Definition 1: Fixed point
  • Definition 2: Stability
  • Definition 3: Nash equilibrium
  • ...and 6 more