Table of Contents
Fetching ...

Evaluating Human-AI Collaboration: A Review and Methodological Framework

George Fragiadakis, Christos Diou, George Kousiouris, Mara Nikolaidou

TL;DR

Evaluating Human-AI Collaboration (HAIC) is challenging due to multidimensional interactions between humans and AI. The authors develop a structured, domain-agnostic evaluation framework that uses a decision-tree to select metrics across AI-Centric, Human-Centric, and Symbiotic HAIC modes, combining quantitative and qualitative indicators. They synthesize literature, define core elements, and propose subfactors and metrics with a weighting scheme to yield an overall score. The work lays the groundwork for standardized, cross-domain HAIC assessment and points to empirical validation in manufacturing, healthcare, finance, and education, with future work on behavior and ethics. The framework aims to facilitate rigorous, real-world evaluation and guide the design of more effective human-AI partnerships.

Abstract

The use of artificial intelligence (AI) in working environments with individuals, known as Human-AI Collaboration (HAIC), has become essential in a variety of domains, boosting decision-making, efficiency, and innovation. Despite HAIC's wide potential, evaluating its effectiveness remains challenging due to the complex interaction of components involved. This paper provides a detailed analysis of existing HAIC evaluation approaches and develops a fresh paradigm for more effectively evaluating these systems. Our framework includes a structured decision tree which assists to select relevant metrics based on distinct HAIC modes (AI-Centric, Human-Centric, and Symbiotic). By including both quantitative and qualitative metrics, the framework seeks to represent HAIC's dynamic and reciprocal nature, enabling the assessment of its impact and success. This framework's practicality can be examined by its application in an array of domains, including manufacturing, healthcare, finance, and education, each of which has unique challenges and requirements. Our hope is that this study will facilitate further research on the systematic evaluation of HAIC in real-world applications.

Evaluating Human-AI Collaboration: A Review and Methodological Framework

TL;DR

Evaluating Human-AI Collaboration (HAIC) is challenging due to multidimensional interactions between humans and AI. The authors develop a structured, domain-agnostic evaluation framework that uses a decision-tree to select metrics across AI-Centric, Human-Centric, and Symbiotic HAIC modes, combining quantitative and qualitative indicators. They synthesize literature, define core elements, and propose subfactors and metrics with a weighting scheme to yield an overall score. The work lays the groundwork for standardized, cross-domain HAIC assessment and points to empirical validation in manufacturing, healthcare, finance, and education, with future work on behavior and ethics. The framework aims to facilitate rigorous, real-world evaluation and guide the design of more effective human-AI partnerships.

Abstract

The use of artificial intelligence (AI) in working environments with individuals, known as Human-AI Collaboration (HAIC), has become essential in a variety of domains, boosting decision-making, efficiency, and innovation. Despite HAIC's wide potential, evaluating its effectiveness remains challenging due to the complex interaction of components involved. This paper provides a detailed analysis of existing HAIC evaluation approaches and develops a fresh paradigm for more effectively evaluating these systems. Our framework includes a structured decision tree which assists to select relevant metrics based on distinct HAIC modes (AI-Centric, Human-Centric, and Symbiotic). By including both quantitative and qualitative metrics, the framework seeks to represent HAIC's dynamic and reciprocal nature, enabling the assessment of its impact and success. This framework's practicality can be examined by its application in an array of domains, including manufacturing, healthcare, finance, and education, each of which has unique challenges and requirements. Our hope is that this study will facilitate further research on the systematic evaluation of HAIC in real-world applications.
Paper Structure (35 sections, 2 figures, 6 tables)