TeleEgo: Benchmarking Egocentric AI Assistants in the Wild

Jiaqi Yan; Ruilong Ren; Jingren Liu; Shuning Xu; Ling Wang; Yiheng Wang; Xinlin Zhong; Yun Wang; Long Zhang; Xiangyu Chen; Changzhi Sun; Jixiang Luo; Dell Zhang; Hao Sun; Chi Zhang; Xuelong Li

TeleEgo: Benchmarking Egocentric AI Assistants in the Wild

Jiaqi Yan, Ruilong Ren, Jingren Liu, Shuning Xu, Ling Wang, Yiheng Wang, Xinlin Zhong, Yun Wang, Long Zhang, Xiangyu Chen, Changzhi Sun, Jixiang Luo, Dell Zhang, Hao Sun, Chi Zhang, Xuelong Li

TL;DR

TeleEgo addresses the gap in evaluating egocentric AI assistants under realistic, long-duration streaming with omni-modal inputs. It provides a long-duration, synchronized dataset (over 14 hours per participant) across four domains, with 3,291 QA items and 12 diagnostic subtasks spanning Memory, Understanding, and Cross-Memory Reasoning. The paper introduces Real-Time Accuracy (RTA) and Memory Persistence Time (MPT) as metrics to capture correctness, timing, and long-term memory in continuous streams, and presents evaluation protocols and baseline results for current models. TeleEgo offers an extensible benchmark for research on real-time behavior and long-horizon memory in first-person AI assistants.

Abstract

Egocentric AI assistants in real-world settings must process multi-modal inputs (video, audio, text), respond in real time, and retain evolving long-term memory. However, existing benchmarks typically evaluate these abilities in isolation, lack realistic streaming scenarios, or support only short-term tasks. We introduce \textbf{TeleEgo}, a long-duration, streaming, omni-modal benchmark for evaluating egocentric AI assistants in realistic daily contexts. The dataset features over 14 hours per participant of synchronized egocentric video, audio, and text across four domains: work \& study, lifestyle \& routines, social activities, and outings \& culture. All data is aligned on a unified global timeline and includes high-quality visual narrations and speech transcripts, curated through human refinement.TeleEgo defines 12 diagnostic subtasks across three core capabilities: Memory (recalling past events), Understanding (interpreting the current moment), and Cross-Memory Reasoning (linking distant events). It contains 3,291 human-verified QA items spanning multiple question formats (single-choice, binary, multi-choice, and open-ended), evaluated strictly in a streaming setting. We propose Real-Time Accuracy (RTA) to jointly capture correctness and responsiveness under tight decision windows, and Memory Persistence Time (MPT) as a forward-looking metric for long-term retention in continuous streams. In this work, we report RTA results for current models and release TeleEgo, together with an MPT evaluation framework, as a realistic and extensible benchmark for future egocentric assistants with stronger streaming memory, enabling systematic study of both real-time behavior and long-horizon memory.

TeleEgo: Benchmarking Egocentric AI Assistants in the Wild

TL;DR

Abstract

TeleEgo: Benchmarking Egocentric AI Assistants in the Wild

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)