Table of Contents
Fetching ...

Are Akpans Trick or Treat: Unveiling Helpful Biases in Assistant Systems

Jiao Sun, Yu Hou, Jiin Kim, Nanyun Peng

TL;DR

Computational measurements of helpfulness are studied, models for automatic helpfulness evaluation are developed, and it is proposed to use the helpfulness level of a dialogue system towards different user queries to gauge the fairness of a dialogue system.

Abstract

Information-seeking AI assistant systems aim to answer users' queries about knowledge in a timely manner. However, both the human-perceived helpfulness of information-seeking assistant systems and its fairness implication are under-explored. In this paper, we study computational measurements of helpfulness. We collect human annotations on the helpfulness of dialogue responses, develop models for automatic helpfulness evaluation, and then propose to use the helpfulness level of a dialogue system towards different user queries to gauge the fairness of a dialogue system. Experiments with state-of-the-art dialogue systems, including ChatGPT, under three information-seeking scenarios reveal that existing systems tend to be more helpful for questions regarding concepts from highly-developed countries than less-developed countries, uncovering potential fairness concerns underlying the current information-seeking assistant systems.

Are Akpans Trick or Treat: Unveiling Helpful Biases in Assistant Systems

TL;DR

Computational measurements of helpfulness are studied, models for automatic helpfulness evaluation are developed, and it is proposed to use the helpfulness level of a dialogue system towards different user queries to gauge the fairness of a dialogue system.

Abstract

Information-seeking AI assistant systems aim to answer users' queries about knowledge in a timely manner. However, both the human-perceived helpfulness of information-seeking assistant systems and its fairness implication are under-explored. In this paper, we study computational measurements of helpfulness. We collect human annotations on the helpfulness of dialogue responses, develop models for automatic helpfulness evaluation, and then propose to use the helpfulness level of a dialogue system towards different user queries to gauge the fairness of a dialogue system. Experiments with state-of-the-art dialogue systems, including ChatGPT, under three information-seeking scenarios reveal that existing systems tend to be more helpful for questions regarding concepts from highly-developed countries than less-developed countries, uncovering potential fairness concerns underlying the current information-seeking assistant systems.