Table of Contents
Fetching ...

Do It For Me vs. Do It With Me: Investigating User Perceptions of Different Paradigms of Automation in Copilots for Feature-Rich Software

Anjali Khurana, Xiaotian Su, April Yi Wang, Parmit K Chilana

TL;DR

The paper investigates how users perceive and interact with two automation paradigms in feature-rich software copilots: a fully automated AutoCopilot and a semi-automatic GuidedCopilot. Through a within-subject study (N=20) across Google Sheets and Figma, GuidedCopilot showed superior user control, perceived utility, and learnability, while AutoCopilot saved time on simpler tasks. A follow-up design exploration added task- and state-aware features (GuidedCopilotVisual and GuidedCopilotADP) evaluated with Photoshop (N=10), demonstrating adaptability to user proficiency and progress. The work offers a three-dimensional framework balancing semi/full automation, adaptive guidance, and user familiarity, highlighting the importance of user control and tailored guidance for effective human-AI collaboration in complex software.

Abstract

Large Language Model (LLM)-based in-application assistants, or copilots, can automate software tasks, but users often prefer learning by doing, raising questions about the optimal level of automation for an effective user experience. We investigated two automation paradigms by designing and implementing a fully automated copilot (AutoCopilot) and a semi-automated copilot (GuidedCopilot) that automates trivial steps while offering step-by-step visual guidance. In a user study (N=20) across data analysis and visual design tasks, GuidedCopilot outperformed AutoCopilot in user control, software utility, and learnability, especially for exploratory and creative tasks, while AutoCopilot saved time for simpler visual tasks. A follow-up design exploration (N=10) enhanced GuidedCopilot with task-and state-aware features, including in-context preview clips and adaptive instructions. Our findings highlight the critical role of user control and tailored guidance in designing the next generation of copilots that enhance productivity, support diverse skill levels, and foster deeper software engagement.

Do It For Me vs. Do It With Me: Investigating User Perceptions of Different Paradigms of Automation in Copilots for Feature-Rich Software

TL;DR

The paper investigates how users perceive and interact with two automation paradigms in feature-rich software copilots: a fully automated AutoCopilot and a semi-automatic GuidedCopilot. Through a within-subject study (N=20) across Google Sheets and Figma, GuidedCopilot showed superior user control, perceived utility, and learnability, while AutoCopilot saved time on simpler tasks. A follow-up design exploration added task- and state-aware features (GuidedCopilotVisual and GuidedCopilotADP) evaluated with Photoshop (N=10), demonstrating adaptability to user proficiency and progress. The work offers a three-dimensional framework balancing semi/full automation, adaptive guidance, and user familiarity, highlighting the importance of user control and tailored guidance for effective human-AI collaboration in complex software.

Abstract

Large Language Model (LLM)-based in-application assistants, or copilots, can automate software tasks, but users often prefer learning by doing, raising questions about the optimal level of automation for an effective user experience. We investigated two automation paradigms by designing and implementing a fully automated copilot (AutoCopilot) and a semi-automated copilot (GuidedCopilot) that automates trivial steps while offering step-by-step visual guidance. In a user study (N=20) across data analysis and visual design tasks, GuidedCopilot outperformed AutoCopilot in user control, software utility, and learnability, especially for exploratory and creative tasks, while AutoCopilot saved time for simpler visual tasks. A follow-up design exploration (N=10) enhanced GuidedCopilot with task-and state-aware features, including in-context preview clips and adaptive instructions. Our findings highlight the critical role of user control and tailored guidance in designing the next generation of copilots that enhance productivity, support diverse skill levels, and foster deeper software engagement.

Paper Structure

This paper contains 45 sections, 8 figures, 1 table.

Figures (8)

  • Figure 1: GuidedCopilot, a novel semi-automatic copilot: (a) Copilot assistance is structured to provide step-by-step guidance along with semi-automation for only repetitive or trivial steps in the task; (b) Visual references of the UI elements in-context to user's tasks and application are provided within the step-by-step guidance; (c) Users have control over editing the LLM extracted entities from their query before the semi-automation is performed; (d) Up-to-date mixed-medium follow-up responses are provided. To see the contrast with the fully automated copilot assistance, please see AutoCopilot in Figure \ref{['FullAuto_ui']}.
  • Figure 2: AutoCopilot: (a) Fully automates the user's task (e.g., creating a webpage that includes a login and product page); (b) Similar to state-of-the-art copilots, demonstrates incorrect automation (such as color coding the entire sheet instead of values greater than 40 in column C); (c) Provides follow-up textual response based on context from software documentation
  • Figure 3: GuidedCopilot Architecture: The user's query is used to initiate a conversation about automating software tasks, which is then transmitted to the query understanding and text retrieval module (Section \ref{['text retrieval module']}). This module interprets the query and performs a contextual search across documentation and web data. The extracted intent and relevant excerpts are processed by GPT-4o to generate text-based procedural steps. These steps are sent to the Image and Automated Function Retrieval Module (Section \ref{['Function Retrieval Module']}) for corresponding visual aids and automated functions, sourced from a curated dataset. Finally, the LLM agent integrates these text, visuals, and semi-automated functions into a cohesive, mixed-medium response tailored to the user's software-related query.
  • Figure 4: Overview of participants’ responses to the post-task questionnaire. The Pearson Chi-Squared test showed a significant difference for each metric across both copilot interventions for completing the Google Sheets and Figma tasks. With GuidedCopilot, users demonstrated higher task completion and higher accuracy and indicated that GuidedCopilot helped them learn the software-specific steps, provided users more control and enhanced their productivity compared to AutoCopilot.
  • Figure 5: Trial-and-Error Differences in AutoCopilot vs GuidedCopilot: This figure illustrates the case of Participant P14, a computer science professional. Despite P14's technical expertise, they encountered higher trial-and-error with AutoCopilot, primarily focused on customizing and debugging the generated automation (e.g., a webpage template) and reversing (or undo) in case of incorrect automation, which ultimately led the user to abandon AutoCopilot. In contrast, P14 experienced fewer trial-and-error instances with GuidedCopilot, mostly related to executing and refining tasks like resizing or altering colors. The user was able to complete the task successfully with GuidedCopilot.
  • ...and 3 more figures