Table of Contents
Fetching ...

KU-DMIS at EHRSQL 2024:Generating SQL query via question templatization in EHR

Hajung Kim, Chanhwi Kim, Hoonick Lee, Kyochul Jang, Jiwoo Lee, Kyungjae Lee, Gangwoo Kim, Jaewoo Kang

TL;DR

A novel text-to-SQL framework that focuses on standardizing the structure of questions into a templated format and shows promising results on the EHRSQL-2024 benchmark dataset, part of the ClinicalNLP shared task.

Abstract

Transforming natural language questions into SQL queries is crucial for precise data retrieval from electronic health record (EHR) databases. A significant challenge in this process is detecting and rejecting unanswerable questions that request information beyond the database's scope or exceed the system's capabilities. In this paper, we introduce a novel text-to-SQL framework that robustly handles out-of-domain questions and verifies the generated queries with query execution.Our framework begins by standardizing the structure of questions into a templated format. We use a powerful large language model (LLM), fine-tuned GPT-3.5 with detailed prompts involving the table schemas of the EHR database system. Our experimental results demonstrate the effectiveness of our framework on the EHRSQL-2024 benchmark benchmark, a shared task in the ClinicalNLP workshop. Although a straightforward fine-tuning of GPT shows promising results on the development set, it struggled with the out-of-domain questions in the test set. With our framework, we improve our system's adaptability and achieve competitive performances in the official leaderboard of the EHRSQL-2024 challenge.

KU-DMIS at EHRSQL 2024:Generating SQL query via question templatization in EHR

TL;DR

A novel text-to-SQL framework that focuses on standardizing the structure of questions into a templated format and shows promising results on the EHRSQL-2024 benchmark dataset, part of the ClinicalNLP shared task.

Abstract

Transforming natural language questions into SQL queries is crucial for precise data retrieval from electronic health record (EHR) databases. A significant challenge in this process is detecting and rejecting unanswerable questions that request information beyond the database's scope or exceed the system's capabilities. In this paper, we introduce a novel text-to-SQL framework that robustly handles out-of-domain questions and verifies the generated queries with query execution.Our framework begins by standardizing the structure of questions into a templated format. We use a powerful large language model (LLM), fine-tuned GPT-3.5 with detailed prompts involving the table schemas of the EHR database system. Our experimental results demonstrate the effectiveness of our framework on the EHRSQL-2024 benchmark benchmark, a shared task in the ClinicalNLP workshop. Although a straightforward fine-tuning of GPT shows promising results on the development set, it struggled with the out-of-domain questions in the test set. With our framework, we improve our system's adaptability and achieve competitive performances in the official leaderboard of the EHRSQL-2024 challenge.
Paper Structure (31 sections, 1 equation, 2 figures, 11 tables)

This paper contains 31 sections, 1 equation, 2 figures, 11 tables.

Figures (2)

  • Figure 1: In the proposed Text-to-SQL framework, when a query is presented in natural language, the model generates SQL code to retrieve the required information from the database. If the query requires information absent from the database, the Text-to-SQL model returns a 'null' response.
  • Figure 2: Overview of our framework. (a) Question Templatization (Sec. \ref{['sec:question Template']}). Implementing question templatization to convert free-form questions into a structured format. (b) SQL Generation (Sec. \ref{['sec:SQL generation']}). Providing task outlines and table information to aid in precise query generation. (c) Self-Reflection and Verification (Sec. \ref{['sec:SQL generation']}, \ref{['sec:SQL Verification']}). Providing detailed table information identified in the initial SQL generation and then finalizing the process.