Exploring Human Perceptions of AI Responses: Insights from a Mixed-Methods Study on Risk Mitigation in Generative Models
Heloisa Candello, Muneeza Azmat, Uma Sushmitha Gunturi, Raya Horesh, Rogerio Abreu de Paula, Heloisa Pimentel, Marcelo Carpinette Grave, Aminat Adebiyi, Tiago Machado, Maysa Malfiza Garcia de Macedo
TL;DR
The paper investigates how humans perceive AI outputs that have been mitigated to reduce harm, using a mixed-method, within-subject study with 57 participants evaluating mitigated versus unmitigated responses across multiple criteria. It introduces a post-hoc mitigator model and a stakeholder-centered evaluation framework, coupled with both quantitative and qualitative analyses, to measure fairness, faithfulness, relevance, competence, and business utility. Key findings show a general preference for mitigated outputs, with variability driven by language, AI work experience, and annotation familiarity; transparency and selective mitigation emerge as important factors for trust. The work contributes novel metrics for evaluating mitigation, highlights the need to align human-centric evaluation with automated benchmarks, and offers practical guidance for designing and deploying mitigation strategies in real-world, diverse user contexts.
Abstract
With the rapid uptake of generative AI, investigating human perceptions of generated responses has become crucial. A major challenge is their `aptitude' for hallucinating and generating harmful contents. Despite major efforts for implementing guardrails, human perceptions of these mitigation strategies are largely unknown. We conducted a mixed-method experiment for evaluating the responses of a mitigation strategy across multiple-dimensions: faithfulness, fairness, harm-removal capacity, and relevance. In a within-subject study design, 57 participants assessed the responses under two conditions: harmful response plus its mitigation and solely mitigated response. Results revealed that participants' native language, AI work experience, and annotation familiarity significantly influenced evaluations. Participants showed high sensitivity to linguistic and contextual attributes, penalizing minor grammar errors while rewarding preserved semantic contexts. This contrasts with how language is often treated in the quantitative evaluation of LLMs. We also introduced new metrics for training and evaluating mitigation strategies and insights for human-AI evaluation studies.
