Table of Contents
Fetching ...

Exploring Human Perceptions of AI Responses: Insights from a Mixed-Methods Study on Risk Mitigation in Generative Models

Heloisa Candello, Muneeza Azmat, Uma Sushmitha Gunturi, Raya Horesh, Rogerio Abreu de Paula, Heloisa Pimentel, Marcelo Carpinette Grave, Aminat Adebiyi, Tiago Machado, Maysa Malfiza Garcia de Macedo

TL;DR

The paper investigates how humans perceive AI outputs that have been mitigated to reduce harm, using a mixed-method, within-subject study with 57 participants evaluating mitigated versus unmitigated responses across multiple criteria. It introduces a post-hoc mitigator model and a stakeholder-centered evaluation framework, coupled with both quantitative and qualitative analyses, to measure fairness, faithfulness, relevance, competence, and business utility. Key findings show a general preference for mitigated outputs, with variability driven by language, AI work experience, and annotation familiarity; transparency and selective mitigation emerge as important factors for trust. The work contributes novel metrics for evaluating mitigation, highlights the need to align human-centric evaluation with automated benchmarks, and offers practical guidance for designing and deploying mitigation strategies in real-world, diverse user contexts.

Abstract

With the rapid uptake of generative AI, investigating human perceptions of generated responses has become crucial. A major challenge is their `aptitude' for hallucinating and generating harmful contents. Despite major efforts for implementing guardrails, human perceptions of these mitigation strategies are largely unknown. We conducted a mixed-method experiment for evaluating the responses of a mitigation strategy across multiple-dimensions: faithfulness, fairness, harm-removal capacity, and relevance. In a within-subject study design, 57 participants assessed the responses under two conditions: harmful response plus its mitigation and solely mitigated response. Results revealed that participants' native language, AI work experience, and annotation familiarity significantly influenced evaluations. Participants showed high sensitivity to linguistic and contextual attributes, penalizing minor grammar errors while rewarding preserved semantic contexts. This contrasts with how language is often treated in the quantitative evaluation of LLMs. We also introduced new metrics for training and evaluating mitigation strategies and insights for human-AI evaluation studies.

Exploring Human Perceptions of AI Responses: Insights from a Mixed-Methods Study on Risk Mitigation in Generative Models

TL;DR

The paper investigates how humans perceive AI outputs that have been mitigated to reduce harm, using a mixed-method, within-subject study with 57 participants evaluating mitigated versus unmitigated responses across multiple criteria. It introduces a post-hoc mitigator model and a stakeholder-centered evaluation framework, coupled with both quantitative and qualitative analyses, to measure fairness, faithfulness, relevance, competence, and business utility. Key findings show a general preference for mitigated outputs, with variability driven by language, AI work experience, and annotation familiarity; transparency and selective mitigation emerge as important factors for trust. The work contributes novel metrics for evaluating mitigation, highlights the need to align human-centric evaluation with automated benchmarks, and offers practical guidance for designing and deploying mitigation strategies in real-world, diverse user contexts.

Abstract

With the rapid uptake of generative AI, investigating human perceptions of generated responses has become crucial. A major challenge is their `aptitude' for hallucinating and generating harmful contents. Despite major efforts for implementing guardrails, human perceptions of these mitigation strategies are largely unknown. We conducted a mixed-method experiment for evaluating the responses of a mitigation strategy across multiple-dimensions: faithfulness, fairness, harm-removal capacity, and relevance. In a within-subject study design, 57 participants assessed the responses under two conditions: harmful response plus its mitigation and solely mitigated response. Results revealed that participants' native language, AI work experience, and annotation familiarity significantly influenced evaluations. Participants showed high sensitivity to linguistic and contextual attributes, penalizing minor grammar errors while rewarding preserved semantic contexts. This contrasts with how language is often treated in the quantitative evaluation of LLMs. We also introduced new metrics for training and evaluating mitigation strategies and insights for human-AI evaluation studies.

Paper Structure

This paper contains 23 sections, 2 figures, 8 tables.

Figures (2)

  • Figure 1: Description of the personas to support the selection criteria
  • Figure 2: Overview of the two-phase within-subjects evaluation methodology showing Phase 1 on the left figure (participants evaluate mitigated responses independently without seeing original outputs, assessing social bias, relevance, faithfulness, and competence) and Phase 2 on thr right half (participants compare original and mitigated responses side-by-side, focusing on faithfulness and competence metrics).