Table of Contents
Fetching ...

GazeNoter: Co-Piloted AR Note-Taking via Gaze Selection of LLM Suggestions to Match Users' Intentions

Hsin-Ruey Tsai, Shih-Kang Chiu, Bryan Wang

TL;DR

GazeNoter presents an AI-copiloted AR note-taking system that uses gaze selection to choose LLM-generated suggestions, enabling both within-context and beyond-context notes with minimal distraction. By integrating LLM-driven extraction, derivation, and organization, and coupling them with an eye-tracking ring input on an AR headset, the system supports real-time, low-load note-taking during speeches and walking meetings. Across two user studies, GazeNoter outperformed manual typing and auto-generated notes in quantity, quality, and usability, with AR providing advantages in distraction and social acceptance. The work demonstrates a practical path for real-time, user-in-the-loop AI in XR, offering scalable benefits for meeting capture, Q&A preparation, and on-the-fly idea capture.

Abstract

Note-taking is critical during speeches and discussions, serving not only for later summarization and organization but also for real-time question and opinion reminding in question-and-answer sessions or timely contributions in discussions. Manually typing on smartphones for note-taking could be distracting and increase cognitive load for users. While large language models (LLMs) are used to automatically generate summaries and highlights, the content generated by artificial intelligence (AI) may not match users' intentions without user input or interaction. Therefore, we propose an AI-copiloted augmented reality (AR) system, GazeNoter, to allow users to swiftly select diverse LLM-generated suggestions via gaze on an AR headset for real-time note-taking. GazeNoter leverages an AR headset as a medium for users to swiftly adjust the LLM output to match their intentions, forming a user-in-the-loop AI system for both within-context and beyond-context notes. We conducted two user studies to verify the usability of GazeNoter in attending speeches in a static sitting condition and walking meetings and discussions in a mobile walking condition, respectively.

GazeNoter: Co-Piloted AR Note-Taking via Gaze Selection of LLM Suggestions to Match Users' Intentions

TL;DR

GazeNoter presents an AI-copiloted AR note-taking system that uses gaze selection to choose LLM-generated suggestions, enabling both within-context and beyond-context notes with minimal distraction. By integrating LLM-driven extraction, derivation, and organization, and coupling them with an eye-tracking ring input on an AR headset, the system supports real-time, low-load note-taking during speeches and walking meetings. Across two user studies, GazeNoter outperformed manual typing and auto-generated notes in quantity, quality, and usability, with AR providing advantages in distraction and social acceptance. The work demonstrates a practical path for real-time, user-in-the-loop AI in XR, offering scalable benefits for meeting capture, Q&A preparation, and on-the-fly idea capture.

Abstract

Note-taking is critical during speeches and discussions, serving not only for later summarization and organization but also for real-time question and opinion reminding in question-and-answer sessions or timely contributions in discussions. Manually typing on smartphones for note-taking could be distracting and increase cognitive load for users. While large language models (LLMs) are used to automatically generate summaries and highlights, the content generated by artificial intelligence (AI) may not match users' intentions without user input or interaction. Therefore, we propose an AI-copiloted augmented reality (AR) system, GazeNoter, to allow users to swiftly select diverse LLM-generated suggestions via gaze on an AR headset for real-time note-taking. GazeNoter leverages an AR headset as a medium for users to swiftly adjust the LLM output to match their intentions, forming a user-in-the-loop AI system for both within-context and beyond-context notes. We conducted two user studies to verify the usability of GazeNoter in attending speeches in a static sitting condition and walking meetings and discussions in a mobile walking condition, respectively.
Paper Structure (42 sections, 18 figures, 2 tables)

This paper contains 42 sections, 18 figures, 2 tables.

Figures (18)

  • Figure 1: The blue part represents the notes users want to directly record as hearing the context, which is defined as within-context notes. The green part represents notes that combine users' insights, defined as beyond-context notes. Achieving both within-context and beyond-context notes ensures that the notes align with users' intentions.
  • Figure 2: (Middle) The flowchart of GazeNoter. (Right) The AR layout, displayed only when the ring is touched. (a) GazeNoter extracts context keywords from the latest sentence of the speech. (b) Once the user selects a context keyword, "city", the pre-defined customized keywords are shown, and 3 candidate sentences are generated based on the context keyword. (c) The user selects a customized keyword, "what", and the candidate sentences are updated accordingly. (d) If the user wants to take a beyond-context note and no desired keyword is among the context keywords, the user selects the most relevant context keyword, "rallies", to generate derivative keywords. (e) The user selects a derivative keyword, "sign", and the candidate sentences beyond the context are updated accordingly. (f) The user selects the candidate sentence best matching the intention, "What signs were displayed...", to record as a note. (g) The recorded note is shown. (h) If no candidate sentences match the intention in step (f), the user could also record all selected keywords as a note (upper). If the user needs to take a note hastily, the user could select only context (or also customized) keywords to record these as a quick note (lower), only from steps (a)(b) or (a)(b)(c).
  • Figure 3: Reviewing notes by pressing the red "Notes" button and further reviewing transcripts and/or refining the notes.
  • Figure 4: (a) The hardware structure of the ring. (b) The button can be withdrawn on the back of the finger, preventing interference with users. (c) The button is extended for input. (d) The ring can be used subtly, such as in a pocket.
  • Figure 5: Study 1 setup of GazeNoter on the AR headset (G) (left) and the smartphone (S) (middle) with its layout (right).
  • ...and 13 more figures