Table of Contents
Fetching ...

FutureMapping: The Computational Structure of Spatial AI Systems

Andrew J. Davison

TL;DR

This paper argues that the next generation of vision-enabled embodied devices will co-design algorithms, processors, and sensors to achieve a general Spatial AI capability that maintains a persistent, near-metric 3D representation with semantic labels. It analyzes the computational structure of Spatial AI using graph-based concepts (geometrical and computation graphs), advocates close alignment between software graphs and hardware topologies, and envisions a central map store coupled with close-to-sensor processing. The work surveys hardware trends (heterogeneous, low-power accelerators; event cameras; cloud offload) and proposes architectural elements like interface nodes and graph-aware processors to enable real-time, power-efficient operation. Collectively, it provides design guidance toward scalable, object-level spatial perception suitable for AR, mobile robotics, and intelligent embodied systems.

Abstract

We discuss and predict the evolution of Simultaneous Localisation and Mapping (SLAM) into a general geometric and semantic `Spatial AI' perception capability for intelligent embodied devices. A big gap remains between the visual perception performance that devices such as augmented reality eyewear or comsumer robots will require and what is possible within the constraints imposed by real products. Co-design of algorithms, processors and sensors will be needed. We explore the computational structure of current and future Spatial AI algorithms and consider this within the landscape of ongoing hardware developments.

FutureMapping: The Computational Structure of Spatial AI Systems

TL;DR

This paper argues that the next generation of vision-enabled embodied devices will co-design algorithms, processors, and sensors to achieve a general Spatial AI capability that maintains a persistent, near-metric 3D representation with semantic labels. It analyzes the computational structure of Spatial AI using graph-based concepts (geometrical and computation graphs), advocates close alignment between software graphs and hardware topologies, and envisions a central map store coupled with close-to-sensor processing. The work surveys hardware trends (heterogeneous, low-power accelerators; event cameras; cloud offload) and proposes architectural elements like interface nodes and graph-aware processors to enable real-time, power-efficient operation. Collectively, it provides design guidance toward scalable, object-level spatial perception suitable for AR, mobile robotics, and intelligent embodied systems.

Abstract

We discuss and predict the evolution of Simultaneous Localisation and Mapping (SLAM) into a general geometric and semantic `Spatial AI' perception capability for intelligent embodied devices. A big gap remains between the visual perception performance that devices such as augmented reality eyewear or comsumer robots will require and what is possible within the constraints imposed by real products. Co-design of algorithms, processors and sensors will be needed. We explore the computational structure of current and future Spatial AI algorithms and consider this within the landscape of ongoing hardware developments.

Paper Structure

This paper contains 19 sections, 4 figures.

Figures (4)

  • Figure 1: The SCAMP5 architecture for integrated visual sensing and processing (Figure taken from Martel and Dudek Martel:etal:ASRMOV2016, and reproduced courtesy of the authors.)
  • Figure 2:
  • Figure 3: Graphcore's Poplar graph compiler turns the specification of an algorithm from a framework such as TensorFlow into a definition of the computation graph which is suitable for efficient distributed deployment on their IPU graph processor. This is a visualisation of the result for the AlexNet image classification CNN, supporting both training and run-time operation, where the spatial configuration and colouring indicates close connectivity of the different processing modules required. Image courtesy of Graphcore.
  • Figure 4: Spatial AI brain: an imagining of how the representation and processing graph structures of a general Spatial AI system might map to a graph processor. The key elements we identify are the real-time processing loop, the graph-based map store, and blocks which interface with sensors and output actuators. Note that we envision additional 'close to the sensor' processing built into visual sensors, aiming to reduce the data bandwidth (eventually in two directions) between the main processor and cameras, which will generally be located some distance away.