Reusable MLOps: Reusable Deployment, Reusable Infrastructure and Hot-Swappable Machine Learning models and services
D Panchal, P Verma, I Baran, D Musgrove, D Lu
TL;DR
The paper addresses the challenge of operationalizing AI/ML workloads in production, where data science efforts are often isolated from deployment and maintenance workflows. It presents Reusable MLOps as a pragmatic framework embedded in the Acumos platform, featuring reusable deployment, reusable infrastructure, and hot-swappable components to serve evolving algorithms without tearing down services. The key contributions include the Acumos Model Runner, Java client onboarding workflow, and a protobuffer-based microservice architecture that supports on-the-fly model replacement, proto updates, and repurposing of existing services. The approach promises tangible benefits in time, cost, and downtime reductions, enabling continuous training and flexible service composition for sustainable, shareable AI infrastructure.
Abstract
Although Machine Learning model building has become increasingly accessible due to a plethora of tools, libraries and algorithms being available freely, easy operationalization of these models is still a problem. It requires considerable expertise in data engineering, software development, cloud and DevOps. It also requires planning, agreement, and vision of how the model is going to be used by the business applications once it is in production, how it is going to be continuously trained on fresh incoming data, and how and when a newer model would replace an existing model. This leads to developers and data scientists working in silos and making suboptimal decisions. It also leads to wasted time and effort. We introduce the Acumos AI platform we developed and we demonstrate some unique novel capabilities that the Acumos model runner possesses, that can help solve the above problems. We introduce a new sustainable concept in the field of AI/ML operations - called Reusable MLOps - where we reuse the existing deployment and infrastructure to serve new models by hot-swapping them without tearing down the infrastructure or the microservice, thus achieving reusable deployment and operations for AI/ML models while still having continuously trained models in production.
