Table of Contents
Fetching ...

DP-Bench: A Benchmark for Evaluating Data Product Creation Systems

Faisal Chowdhury, Sola Shirai, Sarthak Dash, Nandana Mihindukulasooriya, Horst Samulowitz

TL;DR

DP-Bench introduces the first benchmark for automatic data product creation by coupling BIRD's NL-SQL data with ELT-Bench's ELT pipelines to create gold-standard data products, DPRs, provenance, and annotated topics. It defines concrete evaluation metrics across column-level accuracy, derived-column similarity, and Text-to-SQL execution, and provides baseline approaches including hybrid search and various LLM-based data product creation methods. The experiments reveal that while LLM baselines outperform naive No-Search baselines on many metrics, deriving accurate predefined columns and provenance remains challenging, especially for derived columns and in the hard subset. DP-Bench establishes a foundation for systematic research on automating data product generation and highlights directions for improvement in cross-DB data products and agentic optimization.

Abstract

A data product is created with the intention of solving a specific problem, addressing a specific business usecase or meeting a particular need, going beyond just serving data as a raw asset. Data products enable end users to gain greater insights about their data. Since it was first introduced over a decade ago, there has been considerable work, especially in industry, to create data products manually or semi-automatically. However, there exists hardly any benchmark to evaluate automatic data product creation. In this work, we present a benchmark, first of its kind, for this task. We call it DP-Bench. We describe how this benchmark was created by taking advantage of existing work in ELT (Extract-Load-Transform) and Text-to-SQL benchmarks. We also propose a number of LLM based approaches that can be considered as baselines for generating data products automatically. We make the DP-Bench and supplementary materials available in https://huggingface.co/datasets/ibm-research/dp-bench .

DP-Bench: A Benchmark for Evaluating Data Product Creation Systems

TL;DR

DP-Bench introduces the first benchmark for automatic data product creation by coupling BIRD's NL-SQL data with ELT-Bench's ELT pipelines to create gold-standard data products, DPRs, provenance, and annotated topics. It defines concrete evaluation metrics across column-level accuracy, derived-column similarity, and Text-to-SQL execution, and provides baseline approaches including hybrid search and various LLM-based data product creation methods. The experiments reveal that while LLM baselines outperform naive No-Search baselines on many metrics, deriving accurate predefined columns and provenance remains challenging, especially for derived columns and in the hard subset. DP-Bench establishes a foundation for systematic research on automating data product generation and highlights directions for improvement in cross-DB data products and agentic optimization.

Abstract

A data product is created with the intention of solving a specific problem, addressing a specific business usecase or meeting a particular need, going beyond just serving data as a raw asset. Data products enable end users to gain greater insights about their data. Since it was first introduced over a decade ago, there has been considerable work, especially in industry, to create data products manually or semi-automatically. However, there exists hardly any benchmark to evaluate automatic data product creation. In this work, we present a benchmark, first of its kind, for this task. We call it DP-Bench. We describe how this benchmark was created by taking advantage of existing work in ELT (Extract-Load-Transform) and Text-to-SQL benchmarks. We also propose a number of LLM based approaches that can be considered as baselines for generating data products automatically. We make the DP-Bench and supplementary materials available in https://huggingface.co/datasets/ibm-research/dp-bench .

Paper Structure

This paper contains 29 sections, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Example of an input DB and a data product request.
  • Figure 2: Example of a data product created from the input of Figure \ref{['fig:example_dp_input']}.
  • Figure 3: Example of a natural question--SQL pair in BIRD from https://bird-bench.github.io
  • Figure 4: A partial description of the "customers" data model for the "retails" database in ELT-Bench.
  • Figure 5: The number of derived and non-derived columns in the data product for each DB in the DP-Bench.
  • ...and 1 more figures