Table of Contents
Fetching ...

NL2Formula: Generating Spreadsheet Formulas from Natural Language Queries

Wei Zhao, Zhitao Hou, Siyuan Wu, Yan Gao, Haoyu Dong, Yao Wan, Hongyu Zhang, Yulei Sui, Haidong Zhang

TL;DR

This work introduces NL2Formula, a benchmark for generating executable Excel formulas from natural-language queries grounded in spreadsheet tables. It constructs a large dataset (70,799 NL-query/formula pairs over 21,670 tables) by converting Text2SQL data (WikiSQL and Spider) into Excel formulas and augments it with data from TAT-QA, creating a rich evaluation ground for formula synthesis. The authors present fCoder, a T5-based encoder-decoder framework that jointly encodes NL input and tabular context to produce formulas, and demonstrate its superiority over baselines and GPT-3.5 in EM and ERA metrics. The study also analyzes model behavior across Hardness levels and table-position perturbations, revealing the importance of correct evidence extraction and cell indexing, and identifies areas for future improvement and broader function coverage.

Abstract

Writing formulas on spreadsheets, such as Microsoft Excel and Google Sheets, is a widespread practice among users performing data analysis. However, crafting formulas on spreadsheets remains a tedious and error-prone task for many end-users, particularly when dealing with complex operations. To alleviate the burden associated with writing spreadsheet formulas, this paper introduces a novel benchmark task called NL2Formula, with the aim to generate executable formulas that are grounded on a spreadsheet table, given a Natural Language (NL) query as input. To accomplish this, we construct a comprehensive dataset consisting of 70,799 paired NL queries and corresponding spreadsheet formulas, covering 21,670 tables and 37 types of formula functions. We realize the NL2Formula task by providing a sequence-to-sequence baseline implementation called fCoder. Experimental results validate the effectiveness of fCoder, demonstrating its superior performance compared to the baseline models. Furthermore, we also compare fCoder with an initial GPT-3.5 model (i.e., text-davinci-003). Lastly, through in-depth error analysis, we identify potential challenges in the NL2Formula task and advocate for further investigation.

NL2Formula: Generating Spreadsheet Formulas from Natural Language Queries

TL;DR

This work introduces NL2Formula, a benchmark for generating executable Excel formulas from natural-language queries grounded in spreadsheet tables. It constructs a large dataset (70,799 NL-query/formula pairs over 21,670 tables) by converting Text2SQL data (WikiSQL and Spider) into Excel formulas and augments it with data from TAT-QA, creating a rich evaluation ground for formula synthesis. The authors present fCoder, a T5-based encoder-decoder framework that jointly encodes NL input and tabular context to produce formulas, and demonstrate its superiority over baselines and GPT-3.5 in EM and ERA metrics. The study also analyzes model behavior across Hardness levels and table-position perturbations, revealing the importance of correct evidence extraction and cell indexing, and identifies areas for future improvement and broader function coverage.

Abstract

Writing formulas on spreadsheets, such as Microsoft Excel and Google Sheets, is a widespread practice among users performing data analysis. However, crafting formulas on spreadsheets remains a tedious and error-prone task for many end-users, particularly when dealing with complex operations. To alleviate the burden associated with writing spreadsheet formulas, this paper introduces a novel benchmark task called NL2Formula, with the aim to generate executable formulas that are grounded on a spreadsheet table, given a Natural Language (NL) query as input. To accomplish this, we construct a comprehensive dataset consisting of 70,799 paired NL queries and corresponding spreadsheet formulas, covering 21,670 tables and 37 types of formula functions. We realize the NL2Formula task by providing a sequence-to-sequence baseline implementation called fCoder. Experimental results validate the effectiveness of fCoder, demonstrating its superior performance compared to the baseline models. Furthermore, we also compare fCoder with an initial GPT-3.5 model (i.e., text-davinci-003). Lastly, through in-depth error analysis, we identify potential challenges in the NL2Formula task and advocate for further investigation.
Paper Structure (19 sections, 6 equations, 8 figures, 4 tables)

This paper contains 19 sections, 6 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Two running examples from our created dataset for NL2Formula.
  • Figure 2: An example of the Excel formula.
  • Figure 3: Two simple examples of conversion rules to translate SQL queries into formulas.
  • Figure 4: Distribution of formulas in Nl2Formula dataset, including Analysis Query of three hardness levels ($Simple$, $Medium$, $Complex$), and Calculation.
  • Figure 5: An overview of the $f$Coder, which is a reference framework for NL2Formula.
  • ...and 3 more figures