Table of Contents
Fetching ...

BIS: NL2SQL Service Evaluation Benchmark for Business Intelligence Scenarios

Bora Caglayan, Mingxue Wang, John D. Kelleher, Shen Fei, Gui Tong, Jiandong Ding, Puchao Zhang

TL;DR

A new benchmark focused on typical NL questions in industrial BI scenarios is developed, and two novel semantic similarity evaluation metrics for assessing NL2SQL capabilities in BI applications and services are proposed.

Abstract

NL2SQL (Natural Language to Structured Query Language) transformation has seen wide adoption in Business Intelligence (BI) applications in recent years. However, existing NL2SQL benchmarks are not suitable for production BI scenarios, as they are not designed for common business intelligence questions. To address this gap, we have developed a new benchmark focused on typical NL questions in industrial BI scenarios. We discuss the challenges of constructing a BI-focused benchmark and the shortcomings of existing benchmarks. Additionally, we introduce question categories in our benchmark that reflect common BI inquiries. Lastly, we propose two novel semantic similarity evaluation metrics for assessing NL2SQL capabilities in BI applications and services.

BIS: NL2SQL Service Evaluation Benchmark for Business Intelligence Scenarios

TL;DR

A new benchmark focused on typical NL questions in industrial BI scenarios is developed, and two novel semantic similarity evaluation metrics for assessing NL2SQL capabilities in BI applications and services are proposed.

Abstract

NL2SQL (Natural Language to Structured Query Language) transformation has seen wide adoption in Business Intelligence (BI) applications in recent years. However, existing NL2SQL benchmarks are not suitable for production BI scenarios, as they are not designed for common business intelligence questions. To address this gap, we have developed a new benchmark focused on typical NL questions in industrial BI scenarios. We discuss the challenges of constructing a BI-focused benchmark and the shortcomings of existing benchmarks. Additionally, we introduce question categories in our benchmark that reflect common BI inquiries. Lastly, we propose two novel semantic similarity evaluation metrics for assessing NL2SQL capabilities in BI applications and services.

Paper Structure

This paper contains 10 sections, 3 equations, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: Calculation of result and semantic similarity for a benchmark instance.
  • Figure 2: Sample Changes to a Test Query. The transformation type changes the query semantic similarity with different weights.
  • Figure 3: Two results for the query "what is the top 3 revenue streams?". The output A shows the ranking of revenue streams implicitly while the output B shows the ranking of revenue streams explicitly. Semantic similarity performance measure does not over-penalize such output differences.
  • Figure : Semantic Similarity Estimation Algorithm