Table of Contents
Fetching ...

BlendSQL: A Scalable Dialect for Unifying Hybrid Question Answering in Relational Algebra

Parker Glenn, Parag Pravin Dakle, Liang Wang, Preethi Raghavan

TL;DR

BlendSQL introduces a scalable SQL-like intermediate representation to unify reasoning across structured and unstructured data for hybrid question answering. The framework uses a Blender and Parser to compose LLM-powered ingredients (LLMMap, LLMQA, LLMJoin) into a unified query against a SQLite-like database, returning a smoothie object with final results and intermediate steps. With a small set of few-shot exemplars, BlendSQL achieves competitive results on HybridQA, OTT-QA, and FEVEROUS while reducing prompt tokens by approximately 35%, and enables interpretable intermediate reasoning through its script. The work provides open-source code and demonstrates practical, explainable, scalable hybrid QA suitable for large datasets and diverse data sources.

Abstract

Many existing end-to-end systems for hybrid question answering tasks can often be boiled down to a "prompt-and-pray" paradigm, where the user has limited control and insight into the intermediate reasoning steps used to achieve the final result. Additionally, due to the context size limitation of many transformer-based LLMs, it is often not reasonable to expect that the full structured and unstructured context will fit into a given prompt in a zero-shot setting, let alone a few-shot setting. We introduce BlendSQL, a superset of SQLite to act as a unified dialect for orchestrating reasoning across both unstructured and structured data. For hybrid question answering tasks involving multi-hop reasoning, we encode the full decomposed reasoning roadmap into a single interpretable BlendSQL query. Notably, we show that BlendSQL can scale to massive datasets and improve the performance of end-to-end systems while using 35% fewer tokens. Our code is available and installable as a package at https://github.com/parkervg/blendsql.

BlendSQL: A Scalable Dialect for Unifying Hybrid Question Answering in Relational Algebra

TL;DR

BlendSQL introduces a scalable SQL-like intermediate representation to unify reasoning across structured and unstructured data for hybrid question answering. The framework uses a Blender and Parser to compose LLM-powered ingredients (LLMMap, LLMQA, LLMJoin) into a unified query against a SQLite-like database, returning a smoothie object with final results and intermediate steps. With a small set of few-shot exemplars, BlendSQL achieves competitive results on HybridQA, OTT-QA, and FEVEROUS while reducing prompt tokens by approximately 35%, and enables interpretable intermediate reasoning through its script. The work provides open-source code and demonstrates practical, explainable, scalable hybrid QA suitable for large datasets and diverse data sources.

Abstract

Many existing end-to-end systems for hybrid question answering tasks can often be boiled down to a "prompt-and-pray" paradigm, where the user has limited control and insight into the intermediate reasoning steps used to achieve the final result. Additionally, due to the context size limitation of many transformer-based LLMs, it is often not reasonable to expect that the full structured and unstructured context will fit into a given prompt in a zero-shot setting, let alone a few-shot setting. We introduce BlendSQL, a superset of SQLite to act as a unified dialect for orchestrating reasoning across both unstructured and structured data. For hybrid question answering tasks involving multi-hop reasoning, we encode the full decomposed reasoning roadmap into a single interpretable BlendSQL query. Notably, we show that BlendSQL can scale to massive datasets and improve the performance of end-to-end systems while using 35% fewer tokens. Our code is available and installable as a package at https://github.com/parkervg/blendsql.
Paper Structure (47 sections, 4 equations, 9 figures, 6 tables)

This paper contains 47 sections, 4 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Example BlendSQL representation for an OTT-QA dev example.
  • Figure 2: Visualizing built-in BlendSQL ingredients.
  • Figure 3: BlendSQL for "Which teams has the player drafted by the Seattle Mariners in 2008 out of University of Georgia played for in the MLB ?", aligned with Table \ref{['hybridqa_database_example']}
  • Figure 4: Average prompt tokens per question on the HybridQA dev set. BlendSQL enables efficient filtering of large context databases to decrease data passed to the LLM by 35%.
  • Figure 5: Error analysis on a random 50 samples of the HybridQA dev set. As described in Section \ref{['subsec:annotat_cateogies']}, left shows 17 (34%) of the error are True Negative Errors for BlendSQL (blendsql-error). Right shows the causes of those True Negative Errors.
  • ...and 4 more figures