Table of Contents
Fetching ...

Questionnaire meets LLM: A Benchmark and Empirical Study of Structural Skills for Understanding Questions and Responses

Duc-Hai Nguyen, Vijayakumar Nanjappan, Barry O'Sullivan, Hoang D. Nguyen

TL;DR

Questionnaires present complex, heterogeneous structured data that are not well served by existing LLM workflows. The authors introduce QASU, a benchmark that systematically varies serialization formats and prompting strategies across six structural tasks to isolate input-design effects on LLM reasoning. Key findings show that serialization format and prompts can shift accuracy by several percentage points (up to 8.8%), and that self-augmented prompting provides additional improvements (3–4%), with results spanning multiple model families. Together, QASU offers a practical, open benchmark and actionable guidance for integrating LLMs into questionnaire analysis in fields like health, social science, and software engineering.

Abstract

Millions of people take surveys every day, from market polls and academic studies to medical questionnaires and customer feedback forms. These datasets capture valuable insights, but their scale and structure present a unique challenge for large language models (LLMs), which otherwise excel at few-shot reasoning over open-ended text. Yet, their ability to process questionnaire data or lists of questions crossed with hundreds of respondent rows remains underexplored. Current retrieval and survey analysis tools (e.g., Qualtrics, SPSS, REDCap) are typically designed for humans in the workflow, limiting such data integration with LLM and AI-empowered automation. This gap leaves scientists, surveyors, and everyday users without evidence-based guidance on how to best represent questionnaires for LLM consumption. We address this by introducing QASU (Questionnaire Analysis and Structural Understanding), a benchmark that probes six structural skills, including answer lookup, respondent count, and multi-hop inference, across six serialization formats and multiple prompt strategies. Experiments on contemporary LLMs show that choosing an effective format and prompt combination can improve accuracy by up to 8.8% points compared to suboptimal formats. For specific tasks, carefully adding a lightweight structural hint through self-augmented prompting can yield further improvements of 3-4% points on average. By systematically isolating format and prompting effects, our open source benchmark offers a simple yet versatile foundation for advancing both research and real-world practice in LLM-based questionnaire analysis.

Questionnaire meets LLM: A Benchmark and Empirical Study of Structural Skills for Understanding Questions and Responses

TL;DR

Questionnaires present complex, heterogeneous structured data that are not well served by existing LLM workflows. The authors introduce QASU, a benchmark that systematically varies serialization formats and prompting strategies across six structural tasks to isolate input-design effects on LLM reasoning. Key findings show that serialization format and prompts can shift accuracy by several percentage points (up to 8.8%), and that self-augmented prompting provides additional improvements (3–4%), with results spanning multiple model families. Together, QASU offers a practical, open benchmark and actionable guidance for integrating LLMs into questionnaire analysis in fields like health, social science, and software engineering.

Abstract

Millions of people take surveys every day, from market polls and academic studies to medical questionnaires and customer feedback forms. These datasets capture valuable insights, but their scale and structure present a unique challenge for large language models (LLMs), which otherwise excel at few-shot reasoning over open-ended text. Yet, their ability to process questionnaire data or lists of questions crossed with hundreds of respondent rows remains underexplored. Current retrieval and survey analysis tools (e.g., Qualtrics, SPSS, REDCap) are typically designed for humans in the workflow, limiting such data integration with LLM and AI-empowered automation. This gap leaves scientists, surveyors, and everyday users without evidence-based guidance on how to best represent questionnaires for LLM consumption. We address this by introducing QASU (Questionnaire Analysis and Structural Understanding), a benchmark that probes six structural skills, including answer lookup, respondent count, and multi-hop inference, across six serialization formats and multiple prompt strategies. Experiments on contemporary LLMs show that choosing an effective format and prompt combination can improve accuracy by up to 8.8% points compared to suboptimal formats. For specific tasks, carefully adding a lightweight structural hint through self-augmented prompting can yield further improvements of 3-4% points on average. By systematically isolating format and prompting effects, our open source benchmark offers a simple yet versatile foundation for advancing both research and real-world practice in LLM-based questionnaire analysis.

Paper Structure

This paper contains 21 sections, 3 figures, 8 tables.

Figures (3)

  • Figure 1: Input designs evaluated in the QASU benchmark. Each design is a combination of serialization format, layout choices, and prompt annotations.
  • Figure 2: Task types in the QASU benchmark. Each task is designed to evaluate a specific structural skill in processing questionnaire data.
  • Figure 3: Illustration of self-augmented prompting. This process consists of two phases: 1) using self-augmented prompts to ask the LLM to generate additional knowledge (intermediate output) about the table; 2) incorporating the self-augmented response into the second prompt to request the final answer for a downstream task. As depicted in the figure, the LLM is able to identify important values in the table, which assists in generating a more accurate answer for the downstream task.