Cost Minimization in Multi-cloud Systems with Runtime Microservice Re-orchestration
Marco Zambianco, Silvio Cretti, Domenico Siracusa
TL;DR
This work tackles cost-aware re-orchestration of microservices across multi-cloud environments using rolling updates to avoid service disruption. It formulates a Bin Packing–style ILP to minimize deployment cost under locality constraints and locality-aware resource packing, and complements it with a two-phase heuristic that achieves near-optimal cost with minimal disruption. Empirical results show substantial cost improvements over Kubernetes-based baselines while maintaining QoS guarantees, and demonstrate the heuristic’s practicality for large-scale deployments where the optimal solution is intractable. The approach offers a viable path to cost-efficient, disruption-free multi-cloud microservice management in geographically distributed infrastructures.
Abstract
Multi-cloud systems facilitate a cost-efficient and geographically-distributed deployment of microservice-based applications by temporary leasing virtual nodes with diverse pricing models. To preserve the cost-efficiency of multi-cloud deployments, it is essential to redeploy microservices onto the available nodes according to a dynamic resource configuration, which is often performed to better accommodate workload variations. However, this approach leads to frequent service disruption since applications are continuously shutdown and redeployed in order to apply the new resource assignment. To overcome this issue, we propose a re-orchestration scheme that migrates microservice at runtime based on a rolling update scheduling logic. Specifically, we propose an integer linear optimization problem that minimizes the cost associated to multi-cloud virtual nodes and that ensures that delay-sensitive microservices are co-located on the same regional cluster. The resulting rescheduling order guarantees no service disruption by repacking microservices between the available nodes without the need to turn off the outdated microservice instance before redeploying the updated version. In addition, we propose a two-step heuristic scheme that effectively approximates the optimal solution at the expense of close-to-zero service disruption and QoS violation probability. Results show that proposed schemes achieve better performance in terms of cost mitigation, low service disruption and low QoS violation probability compared to baseline schemes replicating Kubernetes scheduler functionalities.
