Table of Contents
Fetching ...

ProAgent: Harnessing On-Demand Sensory Contexts for Proactive LLM Agent Systems

Bufang Yang, Lilin Xu, Liekang Zeng, Yunqi Guo, Siyang Jiang, Wenrui Lu, Kaiwei Liu, Hancheng Xiang, Xiaofan Jiang, Guoliang Xing, Zhenyu Yan

TL;DR

ProAgent presents a first-of-its-kind end-to-end proactive LLM agent that continuously senses rich multisensory contexts and reasons with VLMs to anticipate user needs. It combines an on-demand tiered perception strategy, hierarchical context extraction (sensory and persona cues), and a context-aware proactive reasoner to map contexts to timely tool calls, all while enforcing temporal constraints to minimize disturbance. Implemented on AR glasses with edge servers and evaluated on a real-world testbed and CAB-Lite dataset, ProAgent demonstrates up to 33.4% gains in proactive accuracy, 16.8% improvement in tool-calling F1, and notable user-satisfaction improvements. The work highlights practical feasibility and benefits of proactive assistants that leverage open-world sensory contexts for everyday tasks, with implications for wearable AI, privacy, and edge-enabled human–AI collaboration.

Abstract

Large Language Model (LLM) agents are emerging to transform daily life. However, existing LLM agents primarily follow a reactive paradigm, relying on explicit user instructions to initiate services, which increases both physical and cognitive workload. In this paper, we propose ProAgent, the first end-to-end proactive agent system that harnesses massive sensory contexts and LLM reasoning to deliver proactive assistance. ProAgent first employs a proactive-oriented context extraction approach with on-demand tiered perception to continuously sense the environment and derive hierarchical contexts that incorporate both sensory and persona cues. ProAgent then adopts a context-aware proactive reasoner to map these contexts to user needs and tool calls, providing proactive assistance. We implement ProAgent on Augmented Reality (AR) glasses with an edge server and extensively evaluate it on a real-world testbed, a public dataset, and through a user study. Results show that ProAgent achieves up to 33.4% higher proactive prediction accuracy, 16.8% higher tool-calling F1 score, and notable improvements in user satisfaction over state-of-the-art baselines, marking a significant step toward proactive assistants. A video demonstration of ProAgent is available at https://youtu.be/pRXZuzvrcVs.

ProAgent: Harnessing On-Demand Sensory Contexts for Proactive LLM Agent Systems

TL;DR

ProAgent presents a first-of-its-kind end-to-end proactive LLM agent that continuously senses rich multisensory contexts and reasons with VLMs to anticipate user needs. It combines an on-demand tiered perception strategy, hierarchical context extraction (sensory and persona cues), and a context-aware proactive reasoner to map contexts to timely tool calls, all while enforcing temporal constraints to minimize disturbance. Implemented on AR glasses with edge servers and evaluated on a real-world testbed and CAB-Lite dataset, ProAgent demonstrates up to 33.4% gains in proactive accuracy, 16.8% improvement in tool-calling F1, and notable user-satisfaction improvements. The work highlights practical feasibility and benefits of proactive assistants that leverage open-world sensory contexts for everyday tasks, with implications for wearable AI, privacy, and edge-enabled human–AI collaboration.

Abstract

Large Language Model (LLM) agents are emerging to transform daily life. However, existing LLM agents primarily follow a reactive paradigm, relying on explicit user instructions to initiate services, which increases both physical and cognitive workload. In this paper, we propose ProAgent, the first end-to-end proactive agent system that harnesses massive sensory contexts and LLM reasoning to deliver proactive assistance. ProAgent first employs a proactive-oriented context extraction approach with on-demand tiered perception to continuously sense the environment and derive hierarchical contexts that incorporate both sensory and persona cues. ProAgent then adopts a context-aware proactive reasoner to map these contexts to user needs and tool calls, providing proactive assistance. We implement ProAgent on Augmented Reality (AR) glasses with an edge server and extensively evaluate it on a real-world testbed, a public dataset, and through a user study. Results show that ProAgent achieves up to 33.4% higher proactive prediction accuracy, 16.8% higher tool-calling F1 score, and notable improvements in user satisfaction over state-of-the-art baselines, marking a significant step toward proactive assistants. A video demonstration of ProAgent is available at https://youtu.be/pRXZuzvrcVs.

Paper Structure

This paper contains 44 sections, 25 figures, 1 table.

Figures (25)

  • Figure 1: A user scenario of ProAgent. Reactive agents initiate assistance only upon explicit request, while ProAgent automatically uses rich sensory contexts to offer proactive, seamless, and unobtrusive assistance.
  • Figure 2: Examples of adapting LLMs to context-aware proactive agent tasks, leading to failures such as over-proactivity, missed needs, and tool-calling errors.
  • Figure 3: Adapting existing LLMs and VLMs to the proactive agents.
  • Figure 4: Existing adaptive perception methods in proactive agent systems.
  • Figure 5: Impact of egocentric video on proactive agent reasoning. X-axis denotes periodic sampling interval.
  • ...and 20 more figures