Human-in-the-loop or AI-in-the-loop? Automate or Collaborate?
Sriraam Natarajan, Saurabh Mathur, Sahil Sidheekh, Wolfgang Stammer, Kristian Kersting
TL;DR
This paper reframes how to categorize human-AI collaboration by distinguishing human-in-the-loop (HIL) and AI-in-the-loop ($AI^2L$) paradigms. It analyzes differences in control authority, sources of bias, and evaluation criteria, arguing that many HIL labels overlook the AI's central role. The authors propose transitioning evaluation away from AI-centered metrics toward user- and population-specific outcomes and advocate nesting of domains to choose appropriately between HIL and $AI^2L$. These insights aim to enable more trustworthy, robust, and context-appropriate human-AI systems across domains.
Abstract
Human-in-the-loop (HIL) systems have emerged as a promising approach for combining the strengths of data-driven machine learning models with the contextual understanding of human experts. However, a deeper look into several of these systems reveals that calling them HIL would be a misnomer, as they are quite the opposite, namely AI-in-the-loop ($AI^2L$) systems, where the human is in control of the system, while the AI is there to support the human. We argue that existing evaluation methods often overemphasize the machine (learning) component's performance, neglecting the human expert's critical role. Consequently, we propose an $AI^2L$ perspective, which recognizes that the human expert is an active participant in the system, significantly influencing its overall performance. By adopting an $AI^2L$ approach, we can develop more comprehensive systems that faithfully model the intricate interplay between the human and machine components, leading to more effective and robust AI systems.
