Table of Contents
Fetching ...

ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions

Bufang Yang, Lilin Xu, Liekang Zeng, Kaiwei Liu, Siyang Jiang, Wenrui Lu, Hongkai Chen, Xiaofan Jiang, Guoliang Xing, Zhenyu Yan

TL;DR

ContextAgent addresses the need for open-world, context-aware proactive AI by leveraging multimodal sensory data from wearable devices to infer user intent and autonomously initiate tool-based assistance. It introduces a two-stage framework: proactive-oriented context extraction and a context-aware reasoner that generates thought traces, a proactive score, and planned tool chains when $P_S$ crosses a user-defined threshold. The authors also provide ContextAgentBench, a 1,000-sample benchmark across nine daily life scenarios with twenty tools, and demonstrate that ContextAgent achieves state-of-the-art proactive prediction and tool-calling across multiple LLMs, including smaller models. This work highlights the value of combining sensory-perception data with persona context to create unobtrusive, user-centric AI assistants and provides a pathway toward broader, human-centered proactive AI deployments.

Abstract

Recent advances in Large Language Models (LLMs) have propelled intelligent agents from reactive responses to proactive support. While promising, existing proactive agents either rely exclusively on observations from enclosed environments (e.g., desktop UIs) with direct LLM inference or employ rule-based proactive notifications, leading to suboptimal user intent understanding and limited functionality for proactive service. In this paper, we introduce ContextAgent, the first context-aware proactive agent that incorporates extensive sensory contexts surrounding humans to enhance the proactivity of LLM agents. ContextAgent first extracts multi-dimensional contexts from massive sensory perceptions on wearables (e.g., video and audio) to understand user intentions. ContextAgent then leverages the sensory contexts and personas from historical data to predict the necessity for proactive services. When proactive assistance is needed, ContextAgent further automatically calls the necessary tools to assist users unobtrusively. To evaluate this new task, we curate ContextAgentBench, the first benchmark for evaluating context-aware proactive LLM agents, covering 1,000 samples across nine daily scenarios and twenty tools. Experiments on ContextAgentBench show that ContextAgent outperforms baselines by achieving up to 8.5% and 6.0% higher accuracy in proactive predictions and tool calling, respectively. We hope our research can inspire the development of more advanced, human-centric, proactive AI assistants. The code and dataset are publicly available at https://github.com/openaiotlab/ContextAgent.

ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions

TL;DR

ContextAgent addresses the need for open-world, context-aware proactive AI by leveraging multimodal sensory data from wearable devices to infer user intent and autonomously initiate tool-based assistance. It introduces a two-stage framework: proactive-oriented context extraction and a context-aware reasoner that generates thought traces, a proactive score, and planned tool chains when crosses a user-defined threshold. The authors also provide ContextAgentBench, a 1,000-sample benchmark across nine daily life scenarios with twenty tools, and demonstrate that ContextAgent achieves state-of-the-art proactive prediction and tool-calling across multiple LLMs, including smaller models. This work highlights the value of combining sensory-perception data with persona context to create unobtrusive, user-centric AI assistants and provides a pathway toward broader, human-centered proactive AI deployments.

Abstract

Recent advances in Large Language Models (LLMs) have propelled intelligent agents from reactive responses to proactive support. While promising, existing proactive agents either rely exclusively on observations from enclosed environments (e.g., desktop UIs) with direct LLM inference or employ rule-based proactive notifications, leading to suboptimal user intent understanding and limited functionality for proactive service. In this paper, we introduce ContextAgent, the first context-aware proactive agent that incorporates extensive sensory contexts surrounding humans to enhance the proactivity of LLM agents. ContextAgent first extracts multi-dimensional contexts from massive sensory perceptions on wearables (e.g., video and audio) to understand user intentions. ContextAgent then leverages the sensory contexts and personas from historical data to predict the necessity for proactive services. When proactive assistance is needed, ContextAgent further automatically calls the necessary tools to assist users unobtrusively. To evaluate this new task, we curate ContextAgentBench, the first benchmark for evaluating context-aware proactive LLM agents, covering 1,000 samples across nine daily scenarios and twenty tools. Experiments on ContextAgentBench show that ContextAgent outperforms baselines by achieving up to 8.5% and 6.0% higher accuracy in proactive predictions and tool calling, respectively. We hope our research can inspire the development of more advanced, human-centric, proactive AI assistants. The code and dataset are publicly available at https://github.com/openaiotlab/ContextAgent.

Paper Structure

This paper contains 21 sections, 16 figures, 13 tables.

Figures (16)

  • Figure 1: ContextAgent is a proactive AI assistant free of user explicit instructions. ContextAgent can continuously perceive environmental contexts (e.g., image and audio) to detect the necessity of proactive services, and provide tool-augmented assistance based on LLM reasoning.
  • Figure 2: Comparison with existing works. Reactive LLM agents require explicit user instructions to initiate tasks. Prior proactive LLM agents focus on perceiving enclosed environments (e.g., desktop UIs) and may still require user operations (e.g., keyboard inputs) alongside direct LLM inference. In contrast, ContextAgent requires no manual instructions, harnesses massive sensory contexts from the open world, and employs LLM reasoning for tool-augmented proactive services.
  • Figure 3: Statistics of ContextAgentBench, including the sample distribution across different scenarios, proactive scores, and the number and types of tools. In subfigures (a)–(c), the x-axis shows the number of samples, whereas in (d) it denotes the tool index.
  • Figure 4: Overview of ContextAgent. ContextAgent extracts sensory context from massive sensor perceptions. Then it integrates both sensory and persona contexts into LLM reasoning, generating thought traces, proactive predictions, and calling external tools for proactive services when necessary.
  • Figure 5: Main results on ContextAgentBench. 'DS' refers to 'DeepSeek'.
  • ...and 11 more figures