Table of Contents
Fetching ...

TOM: A Development Platform For Wearable Intelligent Assistants

Nuwan Janaka, Shengdong Zhao, David Hsu, Sherisse Tan Jing Wen, Koh Chun Keat

TL;DR

TOM presents a wearable, context-aware platform for intelligent assistants, addressing the lack of end-to-end development guidance in AR/MR settings. The approach centers on a three-way model of user, context, and system, enabling multimodal sensing, reasoning, and proactive assistance through a layered client-server architecture. The paper details a conceptual design, concrete implementation choices, and proof-of-concept services (running coaching, translation, querying) demonstrated in daily activities, while transparently outlining limitations and avenues for future work, including improved UI adaptability, privacy considerations, and expansion to remote robot interactions. Overall, TOM aims to empower researchers and developers to rapidly create and analyze assistive AR applications across diverse daily activities via an open-source platform and robust data-recording capabilities.

Abstract

Advanced digital assistants can significantly enhance task performance, reduce user burden, and provide personalized guidance to improve users' abilities. However, the development of such intelligent digital assistants presents a formidable challenge. To address this, we introduce TOM, a conceptual architecture and software platform (https://github.com/TOM-Platform) designed to support the development of intelligent wearable assistants that are contextually aware of both the user and the environment. This system was developed collaboratively with AR/MR researchers, HCI researchers, AI/Robotic researchers, and software developers, and it continues to evolve to meet the diverse requirements of these stakeholders. TOM facilitates the creation of intelligent assistive AR applications for daily activities and supports the recording and analysis of user interactions, integration of new devices, and the provision of assistance for various activities. Additionally, we showcase several proof-of-concept assistive services and discuss the challenges involved in developing such services.

TOM: A Development Platform For Wearable Intelligent Assistants

TL;DR

TOM presents a wearable, context-aware platform for intelligent assistants, addressing the lack of end-to-end development guidance in AR/MR settings. The approach centers on a three-way model of user, context, and system, enabling multimodal sensing, reasoning, and proactive assistance through a layered client-server architecture. The paper details a conceptual design, concrete implementation choices, and proof-of-concept services (running coaching, translation, querying) demonstrated in daily activities, while transparently outlining limitations and avenues for future work, including improved UI adaptability, privacy considerations, and expansion to remote robot interactions. Overall, TOM aims to empower researchers and developers to rapidly create and analyze assistive AR applications across diverse daily activities via an open-source platform and robust data-recording capabilities.

Abstract

Advanced digital assistants can significantly enhance task performance, reduce user burden, and provide personalized guidance to improve users' abilities. However, the development of such intelligent digital assistants presents a formidable challenge. To address this, we introduce TOM, a conceptual architecture and software platform (https://github.com/TOM-Platform) designed to support the development of intelligent wearable assistants that are contextually aware of both the user and the environment. This system was developed collaboratively with AR/MR researchers, HCI researchers, AI/Robotic researchers, and software developers, and it continues to evolve to meet the diverse requirements of these stakeholders. TOM facilitates the creation of intelligent assistive AR applications for daily activities and supports the recording and analysis of user interactions, integration of new devices, and the provision of assistance for various activities. Additionally, we showcase several proof-of-concept assistive services and discuss the challenges involved in developing such services.
Paper Structure (31 sections, 6 figures, 2 tables)

This paper contains 31 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Conceptual entities and high-level modules associated with $\textit{TOM}$. Arrow directions represent the communication (e.g., data/interaction) flow.
  • Figure 2: System architecture of $\textit{TOM}$. Arrow directions represent the data flow. (a) Client-Server architecture with multiple clients (b) Server architecture with multiple layers (c) Client architecture with multiple layers. Dashed boxes indicate the communication between the Server and Clients.
  • Figure 3: Running assistance implemented in $\textit{TOM}$. (a) System components that enable the running assistance, $\textbf{C3}_{b}$. Dashed-line boxes indicate implemented Client components, solid lines represent implemented Server components, and dotted lines denote Server components under development. (b) The configuration file that controls the data flow, $\textbf{C3}_{c}$. Data is received in one or more components in the Input Layer (e.g., 'camera' component) and is sent to the next component as specified in the next key (e.g., 'yolov8' component in the Processing Layer). This process occurs similarly for all components regardless of the layer, with the entry point dictating the method in each component that receives the data from the previous component. The exit point then dictates the method for each component, which is called when they should be stopped (e.g., when the context switch indicates the component is no longer required).
  • Figure 4: The running assistance UI supports voice and mid-air gesture input interactions. (a) The user starts the running assistance and is prompted to select a route. (b) $\textit{Jerry}$ provides personalized training guidance, proactive feedback on potential dangers or encouragement, and details about water points while running. (c) In the end, $\textit{Jerry}$ presents the user with a run summary.
  • Figure 5: The translation assistance UI supports voice, mid-air gesture, and gaze input interactions. (a) The user is prompted for the action they wish to take and chooses the 'Translate Text' option. (b) $\textit{Jerry}$ translates the Mandarin text on the screen into English and overlays the translated information onto the location of the Mandarin text. (c) The user shows interest in 'Herbal Jelly' and seeks more general information. (d) The user verbally inquires, "Ok $\textit{Jerry}$, is this vegetarian?"
  • ...and 1 more figures