Table of Contents
Fetching ...

Project Aria: A New Tool for Egocentric Multi-Modal AI Research

Jakob Engel, Kiran Somasundaram, Michael Goesele, Albert Sun, Alexander Gamino, Andrew Turner, Arjang Talattof, Arnie Yuan, Bilal Souti, Brighid Meredith, Cheng Peng, Chris Sweeney, Cole Wilson, Dan Barnes, Daniel DeTone, David Caruso, Derek Valleroy, Dinesh Ginjupalli, Duncan Frost, Edward Miller, Elias Mueggler, Evgeniy Oleinik, Fan Zhang, Guruprasad Somasundaram, Gustavo Solaira, Harry Lanaras, Henry Howard-Jenkins, Huixuan Tang, Hyo Jin Kim, Jaime Rivera, Ji Luo, Jing Dong, Julian Straub, Kevin Bailey, Kevin Eckenhoff, Lingni Ma, Luis Pesqueira, Mark Schwesinger, Maurizio Monge, Nan Yang, Nick Charron, Nikhil Raina, Omkar Parkhi, Peter Borschowa, Pierre Moulon, Prince Gupta, Raul Mur-Artal, Robbie Pennington, Sachin Kulkarni, Sagar Miglani, Santosh Gondi, Saransh Solanki, Sean Diener, Shangyi Cheng, Simon Green, Steve Saarinen, Suvam Patra, Tassos Mourikis, Thomas Whelan, Tripti Singh, Vasileios Balntas, Vijay Baiyya, Wilson Dreewes, Xiaqing Pan, Yang Lou, Yipu Zhao, Yusuf Mansour, Yuyang Zou, Zhaoyang Lv, Zijian Wang, Mingfei Yan, Carl Ren, Renzo De Nardi, Richard Newcombe

TL;DR

This paper presents Project Aria, a wearable egocentric multi-modal data-capture platform designed to drive research in context-aware, personalized AI for future AR glasses. It details the device hardware, sensor suite, time alignment, recording tools, and Machine Perception Services that deliver trajectories, online calibration, semi-dense point clouds, and eye gaze data from uploaded recordings. It also addresses privacy considerations and showcases example applications such as lifelong mapping, egocentric scene reconstruction, object interaction, activity recognition, and long-form summarization. By offering data formats (VRS), open tooling (ariaTools), and backend processing, Aria aims to standardize and accelerate egocentric perception research across institutions. The work underscores the potential of combined multimodal sensing and spatial AI as a foundation for always-on, personalized AR experiences.

Abstract

Egocentric, multi-modal data as available on future augmented reality (AR) devices provides unique challenges and opportunities for machine perception. These future devices will need to be all-day wearable in a socially acceptable form-factor to support always available, context-aware and personalized AI applications. Our team at Meta Reality Labs Research built the Aria device, an egocentric, multi-modal data recording and streaming device with the goal to foster and accelerate research in this area. In this paper, we describe the Aria device hardware including its sensor configuration and the corresponding software tools that enable recording and processing of such data.

Project Aria: A New Tool for Egocentric Multi-Modal AI Research

TL;DR

This paper presents Project Aria, a wearable egocentric multi-modal data-capture platform designed to drive research in context-aware, personalized AI for future AR glasses. It details the device hardware, sensor suite, time alignment, recording tools, and Machine Perception Services that deliver trajectories, online calibration, semi-dense point clouds, and eye gaze data from uploaded recordings. It also addresses privacy considerations and showcases example applications such as lifelong mapping, egocentric scene reconstruction, object interaction, activity recognition, and long-form summarization. By offering data formats (VRS), open tooling (ariaTools), and backend processing, Aria aims to standardize and accelerate egocentric perception research across institutions. The work underscores the potential of combined multimodal sensing and spatial AI as a foundation for always-on, personalized AR experiences.

Abstract

Egocentric, multi-modal data as available on future augmented reality (AR) devices provides unique challenges and opportunities for machine perception. These future devices will need to be all-day wearable in a socially acceptable form-factor to support always available, context-aware and personalized AI applications. Our team at Meta Reality Labs Research built the Aria device, an egocentric, multi-modal data recording and streaming device with the goal to foster and accelerate research in this area. In this paper, we describe the Aria device hardware including its sensor configuration and the corresponding software tools that enable recording and processing of such data.
Paper Structure (21 sections, 13 figures)

This paper contains 21 sections, 13 figures.

Figures (13)

  • Figure 1: The Project Aria device.
  • Figure 2: Example images from the Project Aria device cameras. Left to right: left Mono Scene camera, POV (RGB) camera, right Mono Scene camera, two Eye-tracking cameras. Output of POV (RGB) and Mono Scene cameras are rotated for visualization.
  • Figure 3: Example time-series data from the multi-channel microphone array on the Project Aria device. Audio data is saved in 32-bit format and normalized in the range of $[-1, 1]$.
  • Figure 4: Example data from various motion sensor and location signal data. Top to bottom: accelerometer and gyroscope data provided by the IMUs, and magnetic field measurements provided by the magnetometer.
  • Figure 5: Illustration of the GNSS and Wi-Fi/Bluetooth sensor data. Top to bottom: GNSS signal (individual plots of latitude, longitude, altitude), pressure measurements provided by the barometer, signal strengths from different sources as recorded by the Wi-Fi and Bluetooth receivers.
  • ...and 8 more figures