Table of Contents
Fetching ...

An Outlook into the Future of Egocentric Vision

Chiara Plizzari, Gabriele Goletto, Antonino Furnari, Siddhant Bansal, Francesco Ragusa, Giovanni Maria Farinella, Dima Damen, Tatiana Tommasi

TL;DR

This survey reframes egocentric vision by forecasting a future where wearables with outward cameras and multimodal overlays act as ego-centric assistants (EgoAI). It systematically maps envisioned everyday use-cases to core tasks—localisation, 3D scene understanding, recognition, anticipation, gaze, social behavior, pose, hand-object interactions, identification, summarisation, dialogue, and privacy—reviewing seminal works, current state-of-the-art methods, and relevant datasets. The analysis highlights gaps between present capabilities and the envisioned always-on, personalised EgoAI, emphasizing the need for multi-sensor integration, real-time operation, robust privacy-preserving approaches, and cross-task synergy. The paper concludes with concrete recommendations for immediate exploratory directions and stresses the value of large, diverse datasets (e.g., EPIC-KITCHENS, Ego4D) and emerging ego-language models to enable practical, human-centric egocentric assistance.

Abstract

What will the future be? We wonder! In this survey, we explore the gap between current research in egocentric vision and the ever-anticipated future, where wearable computing, with outward facing cameras and digital overlays, is expected to be integrated in our every day lives. To understand this gap, the article starts by envisaging the future through character-based stories, showcasing through examples the limitations of current technology. We then provide a mapping between this future and previously defined research tasks. For each task, we survey its seminal works, current state-of-the-art methodologies and available datasets, then reflect on shortcomings that limit its applicability to future research. Note that this survey focuses on software models for egocentric vision, independent of any specific hardware. The paper concludes with recommendations for areas of immediate explorations so as to unlock our path to the future always-on, personalised and life-enhancing egocentric vision.

An Outlook into the Future of Egocentric Vision

TL;DR

This survey reframes egocentric vision by forecasting a future where wearables with outward cameras and multimodal overlays act as ego-centric assistants (EgoAI). It systematically maps envisioned everyday use-cases to core tasks—localisation, 3D scene understanding, recognition, anticipation, gaze, social behavior, pose, hand-object interactions, identification, summarisation, dialogue, and privacy—reviewing seminal works, current state-of-the-art methods, and relevant datasets. The analysis highlights gaps between present capabilities and the envisioned always-on, personalised EgoAI, emphasizing the need for multi-sensor integration, real-time operation, robust privacy-preserving approaches, and cross-task synergy. The paper concludes with concrete recommendations for immediate exploratory directions and stresses the value of large, diverse datasets (e.g., EPIC-KITCHENS, Ego4D) and emerging ego-language models to enable practical, human-centric egocentric assistance.

Abstract

What will the future be? We wonder! In this survey, we explore the gap between current research in egocentric vision and the ever-anticipated future, where wearable computing, with outward facing cameras and digital overlays, is expected to be integrated in our every day lives. To understand this gap, the article starts by envisaging the future through character-based stories, showcasing through examples the limitations of current technology. We then provide a mapping between this future and previously defined research tasks. For each task, we survey its seminal works, current state-of-the-art methodologies and available datasets, then reflect on shortcomings that limit its applicability to future research. Note that this survey focuses on software models for egocentric vision, independent of any specific hardware. The paper concludes with recommendations for areas of immediate explorations so as to unlock our path to the future always-on, personalised and life-enhancing egocentric vision.
Paper Structure (118 sections, 6 figures, 3 tables)

This paper contains 118 sections, 6 figures, 3 tables.

Figures (6)

  • Figure 1: EGO-Home. Character-based story envisaging the future of egocentric vision at home. Illustration of the story from Section \ref{['sec:ego-home']}. EgoAI assists Sam during dinner preparation and keeps him entertained with interactive and immersive experiences. 4.2 3D Scene Understanding . 4.3 Object and Action Recognition . Measuring System . 4.11 Dialogue . 4.10 Summarisation and Retrieval . 4.7 Full-body Pose, 4.8 Hand Pose and 4.6 Social Interaction . Medical Imaging . Messaging .
  • Figure 2: Character-based story envisaging the future of egocentric vision in industrial settings. Illustration of the story from Section \ref{['sec:ego-worker']}. EgoAI assists Marco from the start of his day until its conclusion. Safety Compliance Assessment . 4.1 Localisation and Navigation . Messaging . 4.8 Hand-Object Interaction . 4.4 Action Anticipation . Skill Assessment . 4.11 Visual Question Answering , 4.10 Summarisation .
  • Figure 3: Character-based story envisaging the future of egocentric vision in tourism. Illustration of the story from Section \ref{['sec:ego-tourist']}. EgoAI accompanies Claire throughout her itinerary in Turin. Recommendation and Personalisation . 4.2 3D Scene Understanding . 4.5 Gaze Prediction . 4.1 Localisation and Navigation . Messaging . 4.11 Dialogue . 4.3 Action Recognition and Retrieval . 4.10 Summarisation .
  • Figure 4: Character-based story envisaging the future of egocentric vision within the police force. Illustration of the story from Section \ref{['sec:ego-police']}. EgoAI helps Judy, a police officer, during her day keeping her city safe. 4.1 Localisation and Navigation . Messaging . 4.3 Action Recognition . 4.9 Person Re-ID . 4.3 Object Detection and Retrieval . Measuring System . Decision Making . 4.2 3D Scene Understanding . 4.8 Hand-Object Interaction . 4.10 Summarisation . 4.12 Privacy .
  • Figure 5: Character-based story envisaging the future of egocentric vision in the entertainment industry, focusing on the perspective of scene and makeup designers. Illustration of the story from Section \ref{['sec:ego-designer']}. EgoAI helps Stanley, the scenographer, and all the crew during movie production. 4.2 3D Scene Understanding . Recommendation . 4.3 Object Recognition and Retrieval . 4.7 Full-body Pose Estimation . 4.6 Social Interaction . 4.5 Gaze Prediction . 4.8 Hand-Object Interaction . Messaging .
  • ...and 1 more figures