NOMAD -- Navigating Optimal Model Application to Datastreams
Ashwin Gerard Colaco, Sharad Mehrotra, Michael J De Lucia, Kevin Hamlen, Murat Kantarcioglu, Latifur Khan, Ananthram Swami, Bhavani Thuraisingham
TL;DR
NOMAD tackles real-time multiclass classification during data ingestion by dynamically constructing model chains that balance cost and accuracy. It uses a utility-based greedy policy, belief updates, and a formal chain-safety framework to guarantee $\epsilon$-comparable quality to a high-quality role model, even under distribution drift. The approach supports heterogeneous pretrained models, dependent model structures, and batched inference, delivering 2–6× speedups across eight datasets with substantial throughput gains and robust performance under drift, while incurring minimal overhead. Its adaptive priors via ARIMA and Page-Hinkley enable responsive behavior in non-stationary streams, making NOMAD practical for CPU-only deployments in edge and enterprise environments.
Abstract
NOMAD (Navigating Optimal Model Application for Datastreams) is an intelligent framework for data enrichment during ingestion that optimizes realtime multiclass classification by dynamically constructing model chains, i.e ,sequences of machine learning models with varying cost-quality tradeoffs, selected via a utilitybased criterion. Inspired by predicate ordering techniques from database query processing, NOMAD leverages cheaper models as initial filters, proceeding to more expensive models only when necessary, while guaranteeing classification quality remains comparable to a designated role model through a formal chain safety mechanism. It employs a dynamic belief update strategy to adapt model selection based on per event predictions and shifting data distributions, and extends to scenarios with dependent models such as earlyexit DNNs and stacking ensembles. Evaluation across multiple datasets demonstrates that NOMAD achieves significant computational savings compared to static and naive approaches while maintaining classification quality comparable to that achieved by the most accurate (and often the most expensive) model.
