Predictive Query Language: A Domain-Specific Language for Predictive Modeling on Relational Databases
Vid Kocijan, Jinu Sunil, Jan Eric Lenssen, Viman Deb, Xinwei Xe, Federco Reyes Gomez, Matthias Fey, Jure Leskovec
TL;DR
Predictive Query Language (PQL) introduces a SQL-inspired declarative language to define predictive tasks directly on relational databases and automatically generate training tables. It unifies static and temporal data handling, enforces leakage-free data construction, and supports automatic task inference for regression, classification, forecasting, and link prediction. The paper presents two scalable implementations: batch Relational Deep Learning (RDL) and low-latency Relational Foundation Model (RFM), demonstrating substantial speedups (up to 40x) and applicability to real-world domains such as recommendations, fraud detection, and healthcare. By enabling concise, reproducible training-data generation and compatible with existing ML workflows, PQL accelerates model development and enables scalable, interactive predictive analytics on large relational datasets.
Abstract
The purpose of predictive modeling on relational data is to predict future or missing values in a relational database, for example, future purchases of a user, risk of readmission of the patient, or the likelihood that a financial transaction is fraudulent. Typically powered by machine learning methods, predictive models are used in recommendations, financial fraud detection, supply chain optimization, and other systems, providing billions of predictions every day. However, training a machine learning model requires manual work to extract the required training examples - prediction entities and target labels - from the database, which is slow, laborious, and prone to mistakes. Here, we present the Predictive Query Language (PQL), a SQL-inspired declarative language for defining predictive tasks on relational databases. PQL allows specifying a predictive task in a single declarative query, enabling the automatic computation training labels for a large variety of machine learning tasks, such as regression, classification, time-series forecasting, and recommender systems. PQL is already successfully integrated and used in a collection of use cases as part of a predictive AI platform. The versatility of the language can be demonstrated through its many ongoing use cases, including financial fraud, item recommendations, and workload prediction. We demonstrate its versatile design through two implementations; one for small-scale, low-latency use and one that can handle large-scale databases.
