Is AI the better programming partner? Human-Human Pair Programming vs. Human-AI pAIr Programming
Qianou Ma, Tongshuang Wu, Kenneth Koedinger
TL;DR
This paper surveys human-human and human-AI pAIr programming, highlighting that evidence of efficacy is mixed for both paradigms. It synthesizes contexts, methods, and outcome measures, and documents a gap in comprehensive, ecologically valid evaluations for pAIr. The authors discuss moderators like task complexity, compatibility, and communication, and argue that AI partners could be designed to exploit human-AI differences to improve outcomes. They call for standardized metrics, three-way comparative studies, and expanded education-focused research to realize the potential of AI-assisted pair programming.
Abstract
The emergence of large-language models (LLMs) that excel at code generation and commercial products such as GitHub's Copilot has sparked interest in human-AI pair programming (referred to as "pAIr programming") where an AI system collaborates with a human programmer. While traditional pair programming between humans has been extensively studied, it remains uncertain whether its findings can be applied to human-AI pair programming. We compare human-human and human-AI pair programming, exploring their similarities and differences in interaction, measures, benefits, and challenges. We find that the effectiveness of both approaches is mixed in the literature (though the measures used for pAIr programming are not as comprehensive). We summarize moderating factors on the success of human-human pair programming, which provides opportunities for pAIr programming research. For example, mismatched expertise makes pair programming less productive, therefore well-designed AI programming assistants may adapt to differences in expertise levels.
