Table of Contents
Fetching ...

MotiMem: Motion-Aware Approximate Memory for Energy-Efficient Neural Perception in Autonomous Vehicles

Haohua Que, Mingkai Liu, Jiayue Xie, Haojia Gao, Jiajun Sun, Hongyi Xu, Handong Yao, Fei Qiao

Abstract

High-resolution sensors are critical for robust autonomous perception but impose a severe memory wall on battery-constrained electric vehicles. In these systems, data movement energy often outweighs computation. Traditional image compression is ill-suited as it is semantically blind and optimizes for storage rather than bus switching activity. We propose MotiMem, a hardware-software co-designed interface. Exploiting temporal coherence,MotiMem uses lightweight 2D Motion Propagation to dynamically identify Regions of Interest (RoI). Complementing this, a Hybrid Sparsity-Aware Coding scheme leverages adaptive inversion and truncation to induce bitlevel sparsity. Extensive experiments across nuScenes, Waymo, and KITTI with 16 detection models demonstrate that MotiMem reduces memory-interface dynamic energy by approximately 43 percent while retaining approximately 93 percent of the object detection accuracy, establishing a new Pareto frontier significantly superior to standard codecs like JPEG and WebP.

MotiMem: Motion-Aware Approximate Memory for Energy-Efficient Neural Perception in Autonomous Vehicles

Abstract

High-resolution sensors are critical for robust autonomous perception but impose a severe memory wall on battery-constrained electric vehicles. In these systems, data movement energy often outweighs computation. Traditional image compression is ill-suited as it is semantically blind and optimizes for storage rather than bus switching activity. We propose MotiMem, a hardware-software co-designed interface. Exploiting temporal coherence,MotiMem uses lightweight 2D Motion Propagation to dynamically identify Regions of Interest (RoI). Complementing this, a Hybrid Sparsity-Aware Coding scheme leverages adaptive inversion and truncation to induce bitlevel sparsity. Extensive experiments across nuScenes, Waymo, and KITTI with 16 detection models demonstrate that MotiMem reduces memory-interface dynamic energy by approximately 43 percent while retaining approximately 93 percent of the object detection accuracy, establishing a new Pareto frontier significantly superior to standard codecs like JPEG and WebP.

Paper Structure

This paper contains 41 sections, 18 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Pareto Efficiency Analysis (Accuracy vs. Energy Proxy) across nuScenes, Waymo, and KITTI Datasets caesar2020nuscenessun2020scalabilitygeiger2012wegeiger2013vision. MotiMem (marked with a red star) establishes a superior Pareto frontier compared to standard codecs (JPEG, WebP) which fall into the inefficient zone due to high entropy. Evaluated across three diverse datasets and 16 detection models, MotiMem achieves the optimal trade-off by reducing the normalized bit-1 density (a direct proxy for dynamic energy) by $\approx$ 43% while maintaining $\approx$ 93% of the original uncompressed mAP. Crucially, MotiMem significantly outperforms the semantically blind "Global k=4" ablation (blue square), validating that semantic-aware allocation is superior to uniform quantization for energy-constrained perception.
  • Figure 2: The MotiMem closed-loop interface for neural perception. MotiMem sits between the camera stream and the memory hierarchy. It forms a closed loop: detections at time $t$ are fed back as compact metadata to predict the RoI mask for time $t{+}1$. Guided by the RoI mask, MotiMem applies RoI-guided hybrid coding to shape the stored bitstream toward lower activity (fewer 1s/toggles) along the sensor-to-memory path, while preserving perception accuracy.
  • Figure 3: RoI prediction from temporal coherence. Detections at frame $t{-}1$ are propagated to frame $t$ using a lightweight 2D motion prior, inflated by margin $\delta$, and rasterized to a compact block-level RoI mask $M_t$ (e.g., $16{\times}16$ blocks).
  • Figure 4: RoI-guided hybrid coding. Pixels are routed into two paths based on $M_t(p)$ and parameter $k$. RoI: Compute and embed an inversion flag $f$ in the LSB, selectively invert top-$k$ MSBs if $f{=}1$. Background: Truncate to top-$k$ MSBs, apply the same inversion, and embed $f$ in the LSB. Both paths preserve $B$-bit width, reducing bit-1 density and transitions for lower memory energy.
  • Figure 5: Design Space Exploration: Perception vs. Energy Trade-off varying bit-width $k$. The plot sweeps the retained parameter $k$ from 1 to 7. The blue line (Accuracy/Confidence) saturates around $k=4$, while the green line (Energy Cost) continues to rise linearly. This identifies $k=4$ as the algorithmic "Sweet Spot," providing maximum perceptual gain for the minimum necessary energy expenditure.
  • ...and 1 more figures