Table of Contents
Fetching ...

SteROI-D: System Design and Mapping for Stereo Depth Inference on Regions of Interest

Jack Erhardt, Ziang Li, Reid Pinkham, Andrew Berkovich, Zhengya Zhang

TL;DR

SteROI-D tackles the energy bottleneck of real-time stereo depth on AR/VR devices by exploiting ROI sparsity and temporal sparsity. It combines a flexible L2 processor design with Special Compute Units and a multipacket NoC, and introduces a two-tier ROI-aware mapping approach with ROI-bin partitioning to manage dynamic ROIs efficiently. The framework demonstrates up to 4.35x total-system energy savings over a baseline ASIC, with thorough ablations showing the benefits of modest ROI binning and analysis across ROI distributions. The approach enables scalable, real-time stereo depth suitable for battery-constrained AR/VR devices, supported by a design-space exploration that jointly optimizes architecture and dataflow for ROI variability.

Abstract

Machine learning algorithms have enabled high quality stereo depth estimation to run on Augmented and Virtual Reality (AR/VR) devices. However, high energy consumption across the full image processing stack prevents stereo depth algorithms from running effectively on battery-limited devices. This paper introduces SteROI-D, a full stereo depth system paired with a mapping methodology. SteROI-D exploits Region-of-Interest (ROI) and temporal sparsity at the system level to save energy. SteROI-D's flexible and heterogeneous compute fabric supports diverse ROIs. Importantly, we introduce a systematic mapping methodology to effectively handle dynamic ROIs, thereby maximizing energy savings. Using these techniques, our 28nm prototype SteROI-D design achieves up to 4.35x reduction in total system energy compared to a baseline ASIC.

SteROI-D: System Design and Mapping for Stereo Depth Inference on Regions of Interest

TL;DR

SteROI-D tackles the energy bottleneck of real-time stereo depth on AR/VR devices by exploiting ROI sparsity and temporal sparsity. It combines a flexible L2 processor design with Special Compute Units and a multipacket NoC, and introduces a two-tier ROI-aware mapping approach with ROI-bin partitioning to manage dynamic ROIs efficiently. The framework demonstrates up to 4.35x total-system energy savings over a baseline ASIC, with thorough ablations showing the benefits of modest ROI binning and analysis across ROI distributions. The approach enables scalable, real-time stereo depth suitable for battery-constrained AR/VR devices, supported by a design-space exploration that jointly optimizes architecture and dataflow for ROI variability.

Abstract

Machine learning algorithms have enabled high quality stereo depth estimation to run on Augmented and Virtual Reality (AR/VR) devices. However, high energy consumption across the full image processing stack prevents stereo depth algorithms from running effectively on battery-limited devices. This paper introduces SteROI-D, a full stereo depth system paired with a mapping methodology. SteROI-D exploits Region-of-Interest (ROI) and temporal sparsity at the system level to save energy. SteROI-D's flexible and heterogeneous compute fabric supports diverse ROIs. Importantly, we introduce a systematic mapping methodology to effectively handle dynamic ROIs, thereby maximizing energy savings. Using these techniques, our 28nm prototype SteROI-D design achieves up to 4.35x reduction in total system energy compared to a baseline ASIC.

Paper Structure

This paper contains 15 sections, 12 figures.

Figures (12)

  • Figure 1: Illustration of the SteROI-D processing pipeline. Object detection is run on the L2 processor and run infrequently; object tracking is run on intermediate frames on the L1 processors.
  • Figure 2: Distribution of ROI sizes across various object tracking datasets and object classes.
  • Figure 3: EPE (left) and 3-pixel error (right) for HITNet hitnet evaluated on 'Car' ROIs in the KITTI Object Tracking kitti2012 dataset, as measured against inference on full frames.
  • Figure 4: SteROI-D system design.
  • Figure 5: SteROI-D L2 processor architecture. The Special Compute Unit (SCU) accelerates non-parameterized compute patterns.
  • ...and 7 more figures