Table of Contents
Fetching ...

QSTN: A Modular Framework for Robust Questionnaire Inference with Large Language Models

Maximilian Kreutner, Jens Rupprecht, Georg Ahnert, Ahmed Salem, Markus Strohmaier

TL;DR

QSTN tackles robustness issues in questionnaire-style prompting for large language models by providing a modular, open-source framework to systematically vary questionnaire presentation, prompt perturbations, and response generation methods. It supports both local and remote inference alongside a no-code UI, enabling scalable in-silico surveys and reproducible analyses across datasets and models. Key findings show that questionnaire presentation and response-generation strategy substantially influence alignment with human answers and can reduce computational costs, with battery-style presentation often yielding the best subpopulation alignment and restricted generation offering efficiency gains. The work advances reproducibility in LLM-based questionnaire research and broadens applicability to data annotation, psychometrics, and persona studies.

Abstract

We introduce QSTN, an open-source Python framework for systematically generating responses from questionnaire-style prompts to support in-silico surveys and annotation tasks with large language models (LLMs). QSTN enables robust evaluation of questionnaire presentation, prompt perturbations, and response generation methods. Our extensive evaluation ($>40 $ million survey responses) shows that question structure and response generation methods have a significant impact on the alignment of generated survey responses with human answers, and can be obtained for a fraction of the compute cost. In addition, we offer a no-code user interface that allows researchers to set up robust experiments with LLMs without coding knowledge. We hope that QSTN will support the reproducibility and reliability of LLM-based research in the future.

QSTN: A Modular Framework for Robust Questionnaire Inference with Large Language Models

TL;DR

QSTN tackles robustness issues in questionnaire-style prompting for large language models by providing a modular, open-source framework to systematically vary questionnaire presentation, prompt perturbations, and response generation methods. It supports both local and remote inference alongside a no-code UI, enabling scalable in-silico surveys and reproducible analyses across datasets and models. Key findings show that questionnaire presentation and response-generation strategy substantially influence alignment with human answers and can reduce computational costs, with battery-style presentation often yielding the best subpopulation alignment and restricted generation offering efficiency gains. The work advances reproducibility in LLM-based questionnaire research and broadens applicability to data annotation, psychometrics, and persona studies.

Abstract

We introduce QSTN, an open-source Python framework for systematically generating responses from questionnaire-style prompts to support in-silico surveys and annotation tasks with large language models (LLMs). QSTN enables robust evaluation of questionnaire presentation, prompt perturbations, and response generation methods. Our extensive evaluation ( million survey responses) shows that question structure and response generation methods have a significant impact on the alignment of generated survey responses with human answers, and can be obtained for a fraction of the compute cost. In addition, we offer a no-code user interface that allows researchers to set up robust experiments with LLMs without coding knowledge. We hope that QSTN will support the reproducibility and reliability of LLM-based research in the future.

Paper Structure

This paper contains 15 sections, 6 figures, 9 tables.

Figures (6)

  • Figure 1: QSTN Facilitates Easy To Customize and Robust Questionnaire Inference with LLMs.QSTN provides a fully modular pipeline with different ways to present the questionnaire, prompt perturbations and to choose a response generation method, with automatic parsing. Both local and remote inference are supported.
  • Figure 2: QSTN Questionnaire Presentation Modes
  • Figure 3: QSTN Supported Prompt Perturbations
  • Figure 4: QSTN Response Generation Methods
  • Figure 5: Minimum usage example of QSTN.QSTN can be easily integrated into existing projects, requiring just three function calls to operate. Users familiar with vllm or the OpenAI API can use the same Model/Client calls and arguments. In this example reasoning and the generated response are automatically parsed.
  • ...and 1 more figures