Efficient Learned Query Execution over Text and Tables [Technical Report]

Matthias Urban; Carsten Binnig

Efficient Learned Query Execution over Text and Tables [Technical Report]

Matthias Urban, Carsten Binnig

TL;DR

ELEET is presented, a novel execution engine that allows one to seamlessly query and process text as a first-class citizen along with tables along with tables and can speed up multi-modal queries over tables and text by up to 575x without sacrificing accuracy.

Abstract

In this paper, we present ELEET, a novel execution engine that allows one to seamlessly query and process text as a first-class citizen along with tables. To enable such a seamless integration of text and tables, ELEET leverages learned multi-modal operators (MMOps) such as joins and unions that seamlessly combine structured with unstructured textual data. While large language models (LLM) such as GPT-4 are interesting candidates to enable such learned multimodal operations, we deliberately do not follow this trend to enable MMOps, since it would result in high overhead at query runtime. Instead, to enable MMOps, ELEET comes with a more efficient small language model (SLM) that is targeted to extract structured data from text. Thanks to our novel architecture and pre-training procedure, the ELEET-model enables high-accuracy extraction with low overheads. In our evaluation, we compare query execution based on ELEET to baselines leveraging LLMs such as GPT-4 and show that ELEET can speed up multi-modal queries over tables and text by up to 575x without sacrificing accuracy.

Efficient Learned Query Execution over Text and Tables [Technical Report]

TL;DR

Abstract

Efficient Learned Query Execution over Text and Tables [Technical Report]

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (19)