MUSE: Multi-Tenant Model Serving With Seamless Model Updates

Cláudio Correia; Alberto E. A. Ferreira; Lucas Martins; Miguel P. Bento; Sofia Guerreiro; Ricardo Ribeiro Pereira; Ana Sofia Gomes; Jacopo Bono; Hugo Ferreira; Pedro Bizarro

MUSE: Multi-Tenant Model Serving With Seamless Model Updates

Cláudio Correia, Alberto E. A. Ferreira, Lucas Martins, Miguel P. Bento, Sofia Guerreiro, Ricardo Ribeiro Pereira, Ana Sofia Gomes, Jacopo Bono, Hugo Ferreira, Pedro Bizarro

TL;DR

MUSE tackles the challenge of threshold drift in multi-tenant score-serving by decoupling client decision boundaries from model outputs through two complementary transformations: Posterior Correction and Quantile Mapping. It introduces a predictor abstraction and a DAG-based, stateless serving architecture that enables shared models, intent-based routing, and shadow deployments, allowing rapid, safe model promotions without client intervention. Empirical results from production fraud-detection deployments show large improvements in distribution stability, recall at fixed false-positive rates, and calibration metrics, while maintaining strict latency and availability under high throughput. The work demonstrates substantial practical impact by reducing model lead times, enabling resilient defenses against shifting threats, and delivering meaningful cost savings across dozens of tenants. Future directions include automated calibration refresh and generalized Posterior Correction to further automate model evolution in production.

Abstract

In binary classification systems, decision thresholds translate model scores into actions. Choosing suitable thresholds relies on the specific distribution of the underlying model scores but also on the specific business decisions of each client using that model. However, retraining models inevitably shifts score distributions, invalidating existing thresholds. In multi-tenant Score-as-a-Service environments, where decision boundaries reside in client-managed infrastructure, this creates a severe bottleneck: recalibration requires coordinating threshold updates across hundreds of clients, consuming excessive human hours and leading to model stagnation. We introduce MUSE, a model serving framework that enables seamless model updates by decoupling model scores from client decision boundaries. Designed for multi-tenancy, MUSE optimizes infrastructure re-use by sharing models via dynamic intent-based routing, combined with a two-level score transformation that maps model outputs to a stable, reference distribution. Deployed at scale by Feedzai, MUSE processes over a thousand events per second, and over 55 billion events in the last 12 months, across several dozens of tenants, while maintaining high-availability and low-latency guarantees. By reducing model lead time from weeks to minutes, MUSE promotes model resilience against shifting attacks, saving millions of dollars in fraud losses and operational costs.

MUSE: Multi-Tenant Model Serving With Seamless Model Updates

TL;DR

Abstract

Paper Structure (24 sections, 12 equations, 6 figures, 1 table)

This paper contains 24 sections, 12 equations, 6 figures, 1 table.

Introduction
MUSE
System Overview
Predictor Abstraction
Efficient Ensemble Serving.
Ensemble Configuration.
Composable Transformations
Posterior Correction
Ensemble aggregation
Quantile Mapping
Cold-start problem
Routing
Intent-Based Routing
Rolling Deployments and Consistency
Evaluation
...and 9 more sections

Figures (6)

Figure 1: MUSE infrastructure overview with three predictors ($p_1$, $p_2$, and $p_3$) serving scores. Both $p_1$ and $p_2$ are ensembles composed by the individual models $m_1$,$m_2$ and $m_1$,$m_2$,$m_3$, respectively, while $p_3$ is an individual model. The upper section details the data pipeline for predictor $p_2$ during scoring.
Figure 2: Example of declarative routing configuration.
Figure 3: The model lifecycle: from training to shadow validation, and finally to live promotion.
Figure 4: Quantile Transformation update for a "cold-start" deployment. Comparison of the relative error against the target distribution for predictor $\boldsymbol{v_0}$ (Default Transformation), predictor $\boldsymbol{v_1}$ (Custom Transformation), and predictor $\boldsymbol{raw}$ (No Quantile Transformation).
Figure 5: System performance during the update from $\mathcal{T}^Q_{v_0}$ to $\mathcal{T}^Q_{v_1}$. Despite K8s pod restarts, the warm-up process ensures strict adherence to latency targets throughout the transition.
...and 1 more figures

MUSE: Multi-Tenant Model Serving With Seamless Model Updates

TL;DR

Abstract

MUSE: Multi-Tenant Model Serving With Seamless Model Updates

Authors

TL;DR

Abstract

Table of Contents

Figures (6)