Why Human Guidance Matters in Collaborative Vibe Coding

Haoyu Hu; Raja Marjieh; Katherine M Collins; Chenyi Li; Thomas L. Griffiths; Ilia Sucholutsky; Nori Jacoby

Why Human Guidance Matters in Collaborative Vibe Coding

Haoyu Hu, Raja Marjieh, Katherine M Collins, Chenyi Li, Thomas L. Griffiths, Ilia Sucholutsky, Nori Jacoby

TL;DR

The paper investigates how human guidance shapes collaborative vibe coding, a process where high-level explanations steer AI to iteratively generate and refine SVG outputs. It introduces a controlled experimental framework to compare human-led, AI-led, and hybrid configurations across 16 experiments with 604 participants, finding that humans provide durable, high-level guidance while AI-led instructions often collapse over iterations. Hybrid setups with humans directing and AI evaluating and executing yield the best performance, while pure AI guidance degrades. These results reveal misalignment in language use between humans and AI and offer practical design principles for scalable, human-centered AI collaboration beyond programming.

Abstract

Writing code has been one of the most transformative ways for human societies to translate abstract ideas into tangible technologies. Modern AI is transforming this process by enabling experts and non-experts alike to generate code without actually writing code, but instead, through natural language instructions, or "vibe coding". While increasingly popular, the cumulative impact of vibe coding on productivity and collaboration, as well as the role of humans in this process, remains unclear. Here, we introduce a controlled experimental framework for studying collaborative vibe coding and use it to compare human-led, AI-led, and hybrid groups. Across 16 experiments involving 604 human participants, we show that people provide uniquely effective high-level instructions for vibe coding across iterations, whereas AI-provided instructions often result in performance collapse. We further demonstrate that hybrid systems perform best when humans retain directional control (providing the instructions), while evaluation is delegated to AI.

Why Human Guidance Matters in Collaborative Vibe Coding

TL;DR

Abstract

Paper Structure (26 sections, 17 figures)

This paper contains 26 sections, 17 figures.

Introduction
Our approach
Methods
Vibe coding experimental paradigm
Participants and AI queries
Results
Comparing human and AI roles
Divergences between human and AI instructions
Testing hybrid human-AI instructors and selectors
Role division in vibe coding
Robustness to AI model type and social information
Discussion
Limitations
Conclusion
Acknowledgments
...and 11 more sections

Figures (17)

Figure 1: Vibe coding experimental paradigm. (A) Core procedure. An instructor views a reference image and its best SVG rendition from the previous iteration, and uses natural language instructions to guide code-generation. The Code Generator produces SVG code that can then be rendered into an image. (B) Iterated procedure. At each iteration, a selector chooses whether the current or the previous SVG image better matches the reference image. The selected SVG is then passed to an instructor, who provides vibe-coding instructions that are carried forward to the next iteration. (C) Interface of the human validation experiments. Participants rate the similarity of generated SVGs to the reference image.
Figure 2: Case study of human-led (human selectors and instructors) and AI-led (AI selectors and instructors) vibe coding. (A) Example progressions of the experiment for one reference image with human-led (top) and AI-led (bottom) vibe coding. (B) Examples from the last iteration of human-led (top) and AI-led (bottom) chains, more examples in Appendix Fig. \ref{['fig:grid-part1']} to \ref{['fig:grid-part5']}.
Figure 3: Performance of human- and AI-led vibe coding. Data points represent average rating scores across all experiments. Shaded area represents one standard error of the mean.
Figure 4: Comparing instruction semantics in human-led and AI-led conditions. (A) Example human- and AI-generated instructions. (B) UMAP projection of instructions in the embedding space and example word clouds of both human and AI instrubtions. (C) Validation experiment results from different AI-led experiments under instruction length limits. (D) Radar plot of seven semantic metrics: topic entropy (the diversity and unpredictability of topics), descriptive ratio (proportion of descriptive words), sentiment compound (overall emotion of the text), main IDF (how unique the vocabulary is), mean content length (average length of the answer), type token ratio (ratio of unique words to the total answer) and content ratio (proportion of content words vs. function words). For comparison, each metrics was normalized to the scale of 0 to 1.
Figure 5: Hybrid Human-AI-Led vibe coding. (A) The schematics of human-AI hybrid vibe coding. (B) Validation rating results of human-AI hybrid vibe coding. (C) Trade-off between human proportion and the final vibe coding performance.
...and 12 more figures

Why Human Guidance Matters in Collaborative Vibe Coding

TL;DR

Abstract

Why Human Guidance Matters in Collaborative Vibe Coding

Authors

TL;DR

Abstract

Table of Contents

Figures (17)