Table of Contents
Fetching ...

LLM-PQA: LLM-enhanced Prediction Query Answering

Ziyu Li, Wenjie Zhao, Asterios Katsifodimos, Rihan Hai

TL;DR

LLM-PQA tackles the challenge of predicting outcomes from natural-language queries by bridging LLM-based interpretation with ML inference. It uses vector search over a data lake and a model zoo to map queries to appropriate datasets and models, and can train models on the spot when none is available. The approach introduces a structured architecture with components for indexing, retrieval, and model/data management, enabling precise model-dataset pairing and fast response times. The demonstration shows end-to-end NL prediction for regression and classification via a chat interface, highlighting practical applicability across heterogeneous data sources and ML models.

Abstract

The advent of Large Language Models (LLMs) provides an opportunity to change the way queries are processed, moving beyond the constraints of conventional SQL-based database systems. However, using an LLM to answer a prediction query is still challenging, since an external ML model has to be employed and inference has to be performed in order to provide an answer. This paper introduces LLM-PQA, a novel tool that addresses prediction queries formulated in natural language. LLM-PQA is the first to combine the capabilities of LLMs and retrieval-augmented mechanism for the needs of prediction queries by integrating data lakes and model zoos. This integration provides users with access to a vast spectrum of heterogeneous data and diverse ML models, facilitating dynamic prediction query answering. In addition, LLM-PQA can dynamically train models on demand, based on specific query requirements, ensuring reliable and relevant results even when no pre-trained model in a model zoo, available for the task.

LLM-PQA: LLM-enhanced Prediction Query Answering

TL;DR

LLM-PQA tackles the challenge of predicting outcomes from natural-language queries by bridging LLM-based interpretation with ML inference. It uses vector search over a data lake and a model zoo to map queries to appropriate datasets and models, and can train models on the spot when none is available. The approach introduces a structured architecture with components for indexing, retrieval, and model/data management, enabling precise model-dataset pairing and fast response times. The demonstration shows end-to-end NL prediction for regression and classification via a chat interface, highlighting practical applicability across heterogeneous data sources and ML models.

Abstract

The advent of Large Language Models (LLMs) provides an opportunity to change the way queries are processed, moving beyond the constraints of conventional SQL-based database systems. However, using an LLM to answer a prediction query is still challenging, since an external ML model has to be employed and inference has to be performed in order to provide an answer. This paper introduces LLM-PQA, a novel tool that addresses prediction queries formulated in natural language. LLM-PQA is the first to combine the capabilities of LLMs and retrieval-augmented mechanism for the needs of prediction queries by integrating data lakes and model zoos. This integration provides users with access to a vast spectrum of heterogeneous data and diverse ML models, facilitating dynamic prediction query answering. In addition, LLM-PQA can dynamically train models on demand, based on specific query requirements, ensuring reliable and relevant results even when no pre-trained model in a model zoo, available for the task.
Paper Structure (6 sections, 6 figures)

This paper contains 6 sections, 6 figures.

Figures (6)

  • Figure 1: Components of LLM-PQA
  • Figure 2: The workflow of answering prediction query
  • Figure 3: Retrieving model and dataset with vector search
  • Figure 4: Identifying feature (values) from the query
  • Figure 5: Interface: query answering with matched model
  • ...and 1 more figures