Assessing Generative AI value in a public sector context: evidence from a field experiment

Trevor Fitzpatrick; Seamus Kelly; Patrick Carey; David Walsh; Ruairi Nugent

Assessing Generative AI value in a public sector context: evidence from a field experiment

Trevor Fitzpatrick, Seamus Kelly, Patrick Carey, David Walsh, Ruairi Nugent

TL;DR

This study evaluates Generative AI value in public-sector knowledge work through a preregistered field experiment at the Central Bank of Ireland, testing two tasks: Documents (document comprehension) and Data (data analytics). Using 143 staff and a randomized design, the Documents task shows a $17\%$ quality improvement and a $34\%$ faster completion, while the Data task shows a $12\%$ quality decline with no significant time difference, indicating task-dependent effects. The results suggest Gen AI can meaningfully augment information processing when tasks are structured and information-rich, but efficacy diminishes for complex analytics that require cross-document synthesis and domain expertise. The paper also provides detailed field notes on design choices, training needs, and human-in-the-loop considerations necessary for responsible public-sector AI adoption. Overall, the work highlights both the productivity potential and the practical challenges of deploying Gen AI in regulatory and policy contexts.

Abstract

The emergence of Generative AI (Gen AI) has motivated an interest in understanding how it could be used to enhance productivity across various tasks. We add to research results for the performance impact of Gen AI on complex knowledge-based tasks in a public sector setting. In a pre-registered experiment, after establishing a baseline level of performance, we find mixed evidence for two types of composite tasks related to document understanding and data analysis. For the Documents task, the treatment group using Gen AI had a 17% improvement in answer quality scores (as judged by human evaluators) and a 34% improvement in task completion time compared to a control group. For the Data task, we find the Gen AI treatment group experienced a 12% reduction in quality scores and no significant difference in mean completion time compared to the control group. These results suggest that the benefits of Gen AI may be task and potentially respondent dependent. We also discuss field notes and lessons learned, as well as supplementary insights from a post-trial survey and feedback workshop with participants.

Assessing Generative AI value in a public sector context: evidence from a field experiment

TL;DR

Abstract

Assessing Generative AI value in a public sector context: evidence from a field experiment

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (26)