Sensible Agent: A Framework for Unobtrusive Interaction with Proactive AR Agents
Geonsun Lee, Min Xia, Nels Numan, Xun Qian, David Li, Yanhe Chen, Achin Kulshrestha, Ishan Chatterjee, Yinda Zhang, Dinesh Manocha, David Kim, Ruofei Du
TL;DR
Sensible Agent tackles the problem of disruptive interaction with proactive AR agents by jointly optimizing what the agent should offer and how it should be delivered, guided by real-time multimodal context. The framework comprises an Action Recommendation module (What) and an Interaction Adaption module (How), both conditioned on context and social factors, and realized in a WebXR prototype with LLM-based reasoning and multiple input modalities. Across studies, Sensible Agent reduces perceived interaction effort and remains highly usable, while enabling user preferences to shape modality choice, demonstrating practical benefits for unobtrusive, context-aware AR assistance. The work lays groundwork for scalable, socially aware proactive AR systems and points to future extensions in cross-device orchestration and longitudinal personalization within ambient computing environments.
Abstract
Proactive AR agents promise context-aware assistance, but their interactions often rely on explicit voice prompts or responses, which can be disruptive or socially awkward. We introduce Sensible Agent, a framework designed for unobtrusive interaction with these proactive agents. Sensible Agent dynamically adapts both "what" assistance to offer and, crucially, "how" to deliver it, based on real-time multimodal context sensing. Informed by an expert workshop (n=12) and a data annotation study (n=40), the framework leverages egocentric cameras, multimodal sensing, and Large Multimodal Models (LMMs) to infer context and suggest appropriate actions delivered via minimally intrusive interaction modes. We demonstrate our prototype on an XR headset through a user study (n=10) in both AR and VR scenarios. Results indicate that Sensible Agent significantly reduces perceived interaction effort compared to voice-prompted baseline, while maintaining high usability and achieving higher preference.
