Table of Contents
Fetching ...

Can Language Models Enable In-Context Database?

Yu Pan, Hongfeng Yu, Tianjiao Zhao, Jianxin Sun

TL;DR

To enable dynamic database update, delta encoding of database is proposed and a benchmark named InConDB is presented and extensive experiments are conducted to show the performance of different language models in enabling in-context database by varying the database encoding method, prompting method, operation type and input data distribution, revealing both the proficiency and limitations.

Abstract

Large language models (LLMs) are emerging as few-shot learners capable of handling a variety of tasks, including comprehension, planning, reasoning, question answering, arithmetic calculations, and more. At the core of these capabilities is LLMs' proficiency in representing and understanding structural or semi-structural data, such as tables and graphs. Numerous studies have demonstrated that reasoning on tabular data or graphs is not only feasible for LLMs but also gives a promising research direction which treats these data as in-context data. The lightweight and human readable characteristics of in-context database can potentially make it an alternative for the traditional database in typical RAG (Retrieval Augmented Generation) settings. However, almost all current work focuses on static in-context data, which does not allow dynamic update. In this paper, to enable dynamic database update, delta encoding of database is proposed. We explore how data stored in traditional RDBMS can be encoded as in-context text and evaluate LLMs' proficiency for CRUD (Create, Read, Update and Delete) operations on in-context databases. A benchmark named InConDB is presented and extensive experiments are conducted to show the performance of different language models in enabling in-context database by varying the database encoding method, prompting method, operation type and input data distribution, revealing both the proficiency and limitations.

Can Language Models Enable In-Context Database?

TL;DR

To enable dynamic database update, delta encoding of database is proposed and a benchmark named InConDB is presented and extensive experiments are conducted to show the performance of different language models in enabling in-context database by varying the database encoding method, prompting method, operation type and input data distribution, revealing both the proficiency and limitations.

Abstract

Large language models (LLMs) are emerging as few-shot learners capable of handling a variety of tasks, including comprehension, planning, reasoning, question answering, arithmetic calculations, and more. At the core of these capabilities is LLMs' proficiency in representing and understanding structural or semi-structural data, such as tables and graphs. Numerous studies have demonstrated that reasoning on tabular data or graphs is not only feasible for LLMs but also gives a promising research direction which treats these data as in-context data. The lightweight and human readable characteristics of in-context database can potentially make it an alternative for the traditional database in typical RAG (Retrieval Augmented Generation) settings. However, almost all current work focuses on static in-context data, which does not allow dynamic update. In this paper, to enable dynamic database update, delta encoding of database is proposed. We explore how data stored in traditional RDBMS can be encoded as in-context text and evaluate LLMs' proficiency for CRUD (Create, Read, Update and Delete) operations on in-context databases. A benchmark named InConDB is presented and extensive experiments are conducted to show the performance of different language models in enabling in-context database by varying the database encoding method, prompting method, operation type and input data distribution, revealing both the proficiency and limitations.

Paper Structure

This paper contains 25 sections, 5 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Overview of our evaluation framework. First we compose several JSON files, each containing a bunch of CRUD operations for one database schema. Then we use a data sampler to generate samples of (instruction sequence, query) pair. The instruction sequence is used to imitate the daily database operations which constantly come in and change the status of the database. We can also consider the instruction sequence as a representation of the current status of the database. Then a query is used get the result on the current status of the database. We use real database such as MySQL to execute the instruction sequence followed by the query to get the ground truth query result. As another branch, we send the prompting-decorated encoding of the (instruction sequence, query) pair to the large language model, and ask the LLM to imitate a database to execute all the instructions and the query to get a result. Finally we use an accuracy calculator to get the accuracy score measuring the discrepancy between ground truth and LLM-generated query result. In this illustration, the LLM thinks the 4th command (delete operation) can be executed successfully, without noting it violates the foreign key constraint.
  • Figure 2: Illustrations of length of command sequence $l$, the ratio of insert operations $b$ and the overlap $o$ between insert and non-insert operations. In Case 1, the insert (in red) and non-insert operations (in green) have no overlap, so $o$=0. In Case 2, the range of insert and non-insert operations have overlap of 2, thus we calculate $o$ as 2 over the union of their range: 10, so $o=\frac{2}{10}=0.2$ . Similarly, in Case 3, the overlap of insert and non-insert operations is 10, so $o=\frac{10}{10}=1$.
  • Figure 3: An example of model input and output for encoding method: SQL and prompting method: zero-shot
  • Figure 4: An example of model input and output for encoding method: SQL and prompting method: zero-COT
  • Figure 5: An example of model input and output for encoding method: SQL and prompting method: few-shot
  • ...and 5 more figures