MorphingDB: A Task-Centric AI-Native DBMS for Model Management and Inference
Wu Sai, Xia Ruichen, Yang Dingyu, Wang Rui, Lai Huihang, Guan Jiarui, Bai Jiameng, Zhang Dongxiang, Tang Xiu, Xie Zhongle, Lu Peng, Chen Gang
TL;DR
MorphingDB introduces a task-centric AI-native DBMS embedded in PostgreSQL that automates model storage, selection, and inference. It advances in-database tensor handling with Mvec, a two-phase transfer-learning model selection, and DAG-based batch inference with pre-embedding/vector sharing to boost throughput. The system achieves favorable accuracy-resource-time trade-offs across nine public datasets (series, NLP, image) and shows competitive performance against both AI-native DBMSs and AutoML frameworks. Its architecture supports flexible storage modalities, scalable inference, and device-aware execution, enabling practical deployment of AI-driven analytics inside the database engine.
Abstract
The increasing demand for deep neural inference within database environments has driven the emergence of AI-native DBMSs. However, existing solutions either rely on model-centric designs requiring developers to manually select, configure, and maintain models, resulting in high development overhead, or adopt task-centric AutoML approaches with high computational costs and poor DBMS integration. We present MorphingDB, a task-centric AI-native DBMS that automates model storage, selection, and inference within PostgreSQL. To enable flexible, I/O-efficient storage of deep learning models, we first introduce specialized schemas and multi-dimensional tensor data types to support BLOB-based all-in-one and decoupled model storage. Then we design a transfer learning framework for model selection in two phases, which builds a transferability subspace via offline embedding of historical tasks and employs online projection through feature-aware mapping for real-time tasks. To further optimize inference throughput, we propose pre-embedding with vectoring sharing to eliminate redundant computations and DAG-based batch pipelines with cost-aware scheduling to minimize the inference time. Implemented as a PostgreSQL extension with LibTorch, MorphingDB outperforms AI-native DBMSs (EvaDB, Madlib, GaussML) and AutoML platforms (AutoGluon, AutoKeras, AutoSklearn) across nine public datasets, encompassing series, NLP, and image tasks. Our evaluation demonstrates a robust balance among accuracy, resource consumption, and time cost in model selection and significant gains in throughput and resource efficiency.
