FeatInsight: An Online ML Feature Management System on 4Paradigm Sage-Studio Platform
Xin Tong, Xuanhe Zhou, Bingsheng He, Guoliang Li, Zirui Tang, Wei Zhou, Fan Wu, Mian Lu, Yuqiang Chen
TL;DR
FeatInsight delivers an end-to-end online ML feature management solution that unifies feature design, storage, computation, verification, and lineage on 4Paradigm's Sage Studio, powered by OpenMLDB. It tackles high-dimensional, interdependent feature spaces and fast-changing data through a visual SQL design interface, feature views for lineage, unified offline/online computation with consistency checks, and compact time-series data management with lock-free structures. The system has been deployed in numerous real-world scenarios, supporting up to a trillion-dimensional feature space and millisecond-level updates, and demonstrates substantial gains in deployment speed and query latency (e.g., sub-20 ms online feature computation with high recall). These capabilities enable rapid feature design-to-deployment cycles and robust real-time feature services for applications like online product recommendation and fraud detection, delivering practical impact in latency-constrained ML workloads.
Abstract
Feature management is essential for many online machine learning applications and can often become the performance bottleneck (e.g., taking up to 70% of the overall latency in sales prediction service). Improper feature configurations (e.g., introducing too many irrelevant features) can severely undermine the model's generalization capabilities. However, managing online ML features is challenging due to (1) large-scale, complex raw data (e.g., the 2018 PHM dataset contains 17 tables and dozens to hundreds of columns), (2) the need for high-performance, consistent computation of interdependent features with complex patterns, and (3) the requirement for rapid updates and deployments to accommodate real-time data changes. In this demo, we present FeatInsight, a system that supports the entire feature lifecycle, including feature design, storage, visualization, computation, verification, and lineage management. FeatInsight (with OpenMLDB as the execution engine) has been deployed in over 100 real-world scenarios on 4Paradigm's Sage Studio platform, handling up to a trillion-dimensional feature space and enabling millisecond-level feature updates. We demonstrate how FeatInsight enhances feature design efficiency (e.g., for online product recommendation) and improve feature computation performance (e.g., for online fraud detection). The code is available at https://github.com/4paradigm/FeatInsight.
