Table of Contents
Fetching ...

Initializing Services in Interactive ML Systems for Diverse Users

Avinandan Bose, Mihaela Curmei, Daniel L. Jiang, Jamie Morgenstern, Sarah Dean, Lillian J. Ratliff, Maryam Fazel

TL;DR

This work tackles initializing multiple specialized ML services for a user population with diverse preferences under bandit feedback. It generalizes the classic k-means++ initialization to a broad loss framework, proving that a data-efficient randomized procedure can achieve a near-optimal total loss after initialization, with a logarithmic dependence on the number of services. The authors extend the theory to fair objectives across demographic groups and provide a linear-predictor generalization with finite-sample guarantees, accompanied by empirical validation on Census and MovieLens datasets. The results demonstrate the practical impact of robust initialization on subsequent learning dynamics, showing faster convergence and improved fairness compared to baselines. Overall, the paper provides a principled, scalable approach to initialize multi-service ML systems in settings with heterogeneous users and limited feedback.

Abstract

This paper investigates ML systems serving a group of users, with multiple models/services, each aimed at specializing to a sub-group of users. We consider settings where upon deploying a set of services, users choose the one minimizing their personal losses and the learner iteratively learns by interacting with diverse users. Prior research shows that the outcomes of learning dynamics, which comprise both the services' adjustments and users' service selections, hinge significantly on the initialization. However, finding good initializations faces two main challenges: (i) Bandit feedback: Typically, data on user preferences are not available before deploying services and observing user behavior; (ii) Suboptimal local solutions: The total loss landscape (i.e., the sum of loss functions across all users and services) is not convex and gradient-based algorithms can get stuck in poor local minima. We address these challenges with a randomized algorithm to adaptively select a minimal set of users for data collection in order to initialize a set of services. Under mild assumptions on the loss functions, we prove that our initialization leads to a total loss within a factor of the globally optimal total loss with complete user preference data}, and this factor scales logarithmically in the number of services. This result is a generalization of the well-known $k$-means++ guarantee to a broad problem class, which is also of independent interest. The theory is complemented by experiments on real as well as semi-synthetic datasets.

Initializing Services in Interactive ML Systems for Diverse Users

TL;DR

This work tackles initializing multiple specialized ML services for a user population with diverse preferences under bandit feedback. It generalizes the classic k-means++ initialization to a broad loss framework, proving that a data-efficient randomized procedure can achieve a near-optimal total loss after initialization, with a logarithmic dependence on the number of services. The authors extend the theory to fair objectives across demographic groups and provide a linear-predictor generalization with finite-sample guarantees, accompanied by empirical validation on Census and MovieLens datasets. The results demonstrate the practical impact of robust initialization on subsequent learning dynamics, showing faster convergence and improved fairness compared to baselines. Overall, the paper provides a principled, scalable approach to initialize multi-service ML systems in settings with heterogeneous users and limited feedback.

Abstract

This paper investigates ML systems serving a group of users, with multiple models/services, each aimed at specializing to a sub-group of users. We consider settings where upon deploying a set of services, users choose the one minimizing their personal losses and the learner iteratively learns by interacting with diverse users. Prior research shows that the outcomes of learning dynamics, which comprise both the services' adjustments and users' service selections, hinge significantly on the initialization. However, finding good initializations faces two main challenges: (i) Bandit feedback: Typically, data on user preferences are not available before deploying services and observing user behavior; (ii) Suboptimal local solutions: The total loss landscape (i.e., the sum of loss functions across all users and services) is not convex and gradient-based algorithms can get stuck in poor local minima. We address these challenges with a randomized algorithm to adaptively select a minimal set of users for data collection in order to initialize a set of services. Under mild assumptions on the loss functions, we prove that our initialization leads to a total loss within a factor of the globally optimal total loss with complete user preference data}, and this factor scales logarithmically in the number of services. This result is a generalization of the well-known -means++ guarantee to a broad problem class, which is also of independent interest. The theory is complemented by experiments on real as well as semi-synthetic datasets.
Paper Structure (17 sections, 11 theorems, 69 equations, 6 figures, 2 algorithms)

This paper contains 17 sections, 11 theorems, 69 equations, 6 figures, 2 algorithms.

Key Result

Theorem 3.1

Consider $n$ users with unknown preferences $\{\phi_1, \ldots, \phi_n\} \subset \mathbb{R}^d$, and associated loss functions $\mathcal{L}_i(\cdot,\cdot)$ satisfying Assumptions ass:unique and ass:triangle, with bandit access. Let $\Theta_{\rm OPT} \subset \mathbb{R}^d$ be the set of $k$ services min where the expectation is taken over the randomization of the algorithm and $K_{\rm OPT}$ is equal t

Figures (6)

  • Figure 1: Fig. \ref{['fig:fairness_all']} and\ref{['fig:fairness_worst']} show the performance of various user selection strategies on the travel time prediction task on the Census data. Notably, our findings reveal that the greedy and epsilon-greedy baselines exhibit strong performance for $k < 10$. However, as the value of $k$ grows, these strategies prove myopic, with random sampling surpassing their effectiveness. AcQUIre and Fair AcQUIre consistently emerge as the two best baselines for both tasks. Fig. \ref{['fig:fairness_ml10m']} presents the average excess error for the movie recommendation task. Remarkably, the greedy algorithm demonstrates efficacy when $k$ is small. Epsilon-greedy, employing an explore-vs-exploit approach, successfully overcomes myopic tendencies. Nevertheless, AcQUIre continues to be the best baseline for data collection.
  • Figure 2: Runtimes for AcQUIre and baselines as number of users ($N$) and services ($K$) vary.
  • Figure 3: We study the importance of initialization in both the convergence rate and quality of converged solution of optimization algorithms. We find AcQUIre converges both faster and to a lower total loss across optimization methods (kmeans and multiplicative weights) as well as datasets.
  • Figure 4: Fair objective improvement for AcQUIre over the baseline across different demographics. We observe that there is atleast 15% improvement across sex demographics for a wide range of number of services. For racial demographics the improvement is 7-26%.
  • Figure 5: Average losses across different demographic groups for Fair AcQUIre (left,middle). Percentage improvement over baseline (right). We observe that Fair AcQUIre reduces disparity across different groups compared to the baseline.
  • ...and 1 more figures

Theorems & Definitions (22)

  • Definition 2.3
  • Theorem 3.1
  • Theorem 3.2
  • Theorem 4.4
  • Remark 4.5
  • Definition D.1
  • Lemma D.1
  • proof
  • Lemma D.1
  • proof
  • ...and 12 more