Table of Contents
Fetching ...

TaskSense: Cognitive Chain Modeling and Difficulty Estimation for GUI Tasks

Yiwen Yin, Zhian Hu, Xiaoxi Xu, Chun Yu, Xintong Wu, Wenyu Fan, Yuanchun Shi

TL;DR

The paper tackles GUI task difficulty by incorporating cognitive demands, not just motor step counts. It introduces Cognitive Chain, which decomposes tasks into cognitive steps preceding each motor action using an eight-type taxonomy and information-theoretic difficulty indices, with D^{CogStep} = K^{Type} \cdot I^{CogStep}; total task difficulty sums across chains. An LLM-based extraction method automatically derives cognitive chains from task traces, and the approach is evaluated on 18 tasks with 33 participants, achieving step-level $R^2$ values up to $0.46$ (annotated) and task-level up to $0.69$, outperforming baselines. The study also assesses four GUI agents, revealing patterns of Human-AI consistency and identifying cognitive gaps in current agents, especially for high-demand steps like Verify, and discusses applications in agent training, capability benchmarking, and human–AI task delegation optimization.

Abstract

Measuring GUI task difficulty is crucial for user behavior analysis and agent capability evaluation. Yet, existing benchmarks typically quantify difficulty based on motor actions (e.g., step counts), overlooking the cognitive demands underlying task completion. In this work, we propose Cognitive Chain, a novel framework that models task difficulty from a cognitive perspective. A cognitive chain decomposes the cognitive processes preceding a motor action into a sequence of cognitive steps (e.g., finding, deciding, computing), each with a difficulty index grounded in information theories. We develop an LLM-based method to automatically extract cognitive chains from task execution traces. Validation with linear regression shows that our estimated cognitive difficulty correlates well with user completion time (step-level R-square=0.46 after annotation). Assessment of state-of-the-art GUI agents shows reduced success on cognitively demanding tasks, revealing capability gaps and Human-AI consistency patterns. We conclude by discussing potential applications in agent training, capability assessment, and human-agent delegation optimization.

TaskSense: Cognitive Chain Modeling and Difficulty Estimation for GUI Tasks

TL;DR

The paper tackles GUI task difficulty by incorporating cognitive demands, not just motor step counts. It introduces Cognitive Chain, which decomposes tasks into cognitive steps preceding each motor action using an eight-type taxonomy and information-theoretic difficulty indices, with D^{CogStep} = K^{Type} \cdot I^{CogStep}; total task difficulty sums across chains. An LLM-based extraction method automatically derives cognitive chains from task traces, and the approach is evaluated on 18 tasks with 33 participants, achieving step-level values up to (annotated) and task-level up to , outperforming baselines. The study also assesses four GUI agents, revealing patterns of Human-AI consistency and identifying cognitive gaps in current agents, especially for high-demand steps like Verify, and discusses applications in agent training, capability benchmarking, and human–AI task delegation optimization.

Abstract

Measuring GUI task difficulty is crucial for user behavior analysis and agent capability evaluation. Yet, existing benchmarks typically quantify difficulty based on motor actions (e.g., step counts), overlooking the cognitive demands underlying task completion. In this work, we propose Cognitive Chain, a novel framework that models task difficulty from a cognitive perspective. A cognitive chain decomposes the cognitive processes preceding a motor action into a sequence of cognitive steps (e.g., finding, deciding, computing), each with a difficulty index grounded in information theories. We develop an LLM-based method to automatically extract cognitive chains from task execution traces. Validation with linear regression shows that our estimated cognitive difficulty correlates well with user completion time (step-level R-square=0.46 after annotation). Assessment of state-of-the-art GUI agents shows reduced success on cognitively demanding tasks, revealing capability gaps and Human-AI consistency patterns. We conclude by discussing potential applications in agent training, capability assessment, and human-agent delegation optimization.

Paper Structure

This paper contains 31 sections, 6 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: An example of cognitive chains in a task (T15 in our task set), where the user creates calendar events using interview invitation information from emails.
  • Figure 2: Workflow of our cognitive chain extraction method.
  • Figure 3: Visualized time regression and prediction results.
  • Figure 4: Success rate of four agents across different cognitive types and difficulty bins (equal-frequency).