InferF: Declarative Factorization of AI/ML Inferences over Joins
Kanchan Chowdhury, Lixi Zhou, Lulu Xie, Xinwei Fu, Jia Zou
TL;DR
InferF formalizes and solves the end-to-end optimization problem of factorizing AI/ML inferences over multi-way joins. It introduces a declarative intermediate representation for arbitrary inference workflows, proves NP-hardness of the push-down planning task, and provides two model-driven optimizers (genetic and greedy) to navigate the exponential search space. Implemented on Velox, InferF achieves up to 11.3x speedups over the best Velox baseline and up to 18.7x over other in-DB ML systems, while offering insights into when factorization yields benefits. The work demonstrates that decoupling join-order optimization from factorization and enabling group push-down with aggregation can substantially reduce both CPU and I/O costs in complex, UDF-heavy inference workloads over relational joins.
Abstract
Real-world AI/ML workflows often apply inference computations to feature vectors joined from multiple datasets. To avoid the redundant AI/ML computations caused by repeated data records in the join's output, factorized ML has been proposed to decompose ML computations into sub-computations to be executed on each normalized dataset. However, there is insufficient discussion on how factorized ML could impact AI/ML inference over multi-way joins. To address the limitations, we propose a novel declarative InferF system, focusing on the factorization of arbitrary inference workflows represented as analyzable expressions over the multi-way joins. We formalize our problem to flexibly push down partial factorized computations to qualified nodes in the join tree to minimize the overall inference computation and join costs and propose two algorithms to resolve the problem: (1) a greedy algorithm based on a per-node cost function that estimates the influence on overall latency if a subset of factorized computations is pushed to a node, and (2) a genetic algorithm for iteratively enumerating and evaluating promising factorization plans. We implement InferF on Velox, an open-sourced database engine from Meta, evaluate it on real-world datasets, observed up to 11.3x speedups, and systematically summarized the factors that determine when factorized ML can benefit AI/ML inference workflows.
