Complexity at Scale: A Quantitative Analysis of an Alibaba Microservice Deployment
Authors
Giles Winchester, George Parisis, Guoyao Xu, Luc Berthouze
Abstract
Microservice management and testbed research often rests on assumptions about deployments that have rarely been validated at production scale. While recent studies have begun to characterise production microservice deployments, they are often limited in breadth, do not compare findings across deployments, and lack consideration of the implications of findings for commonly held assumptions. We analyse a distributed tracing dataset from Alibaba's production microservice deployment to examine its scale, heterogeneity, and dynamicity. By comparing our findings to prior measurements of Meta's MSA we illustrate both convergent and divergent properties, clarifying which patterns may generalise. Our study reveals extreme architectural scale, long-tail distributions of workloads and dependencies, highly diverse functionality, substantial call graph variability, and pronounced time-varying behaviour which diverge from assumptions underlying research models and testbeds. We summarise how these observations challenge common assumptions in research on fault management, scaling, and testbed design, and outline recommendations for more realistic future approaches and evaluations.