Table of Contents
Fetching ...

Energy-Efficient & Real-Time Computer Vision with Intelligent Skipping via Reconfigurable CMOS Image Sensors

Md Abdullah-Al Kaiser, Sreetama Sarkar, Peter A. Beerel, Akhilesh R. Jaiswal, Gourav Datta

TL;DR

A custom-designed reconfigurable CMOS image sensor (CIS) system that improves energy efficiency by selectively skipping uneventful regions or rows within a frame during the sensor's readout phase, and the subsequent analog-to-digital conversion (ADC) phase.

Abstract

Current video-based computer vision (CV) applications typically suffer from high energy consumption due to reading and processing all pixels in a frame, regardless of their significance. While previous works have attempted to reduce this energy by skipping input patches or pixels and using feedback from the end task to guide the skipping algorithm, the skipping is not performed during the sensor read phase. As a result, these methods can not optimize the front-end sensor energy. Moreover, they may not be suitable for real-time applications due to the long latency of modern CV networks that are deployed in the back-end. To address this challenge, this paper presents a custom-designed reconfigurable CMOS image sensor (CIS) system that improves energy efficiency by selectively skipping uneventful regions or rows within a frame during the sensor's readout phase, and the subsequent analog-to-digital conversion (ADC) phase. A novel masking algorithm intelligently directs the skipping process in real-time, optimizing both the front-end sensor and back-end neural networks for applications including autonomous driving and augmented/virtual reality (AR/VR). Our system can also operate in standard mode without skipping, depending on application needs. We evaluate our hardware-algorithm co-design framework on object detection based on BDD100K and ImageNetVID, and gaze estimation based on OpenEDS, achieving up to 53% reduction in front-end sensor energy while maintaining state-of-the-art (SOTA) accuracy.

Energy-Efficient & Real-Time Computer Vision with Intelligent Skipping via Reconfigurable CMOS Image Sensors

TL;DR

A custom-designed reconfigurable CMOS image sensor (CIS) system that improves energy efficiency by selectively skipping uneventful regions or rows within a frame during the sensor's readout phase, and the subsequent analog-to-digital conversion (ADC) phase.

Abstract

Current video-based computer vision (CV) applications typically suffer from high energy consumption due to reading and processing all pixels in a frame, regardless of their significance. While previous works have attempted to reduce this energy by skipping input patches or pixels and using feedback from the end task to guide the skipping algorithm, the skipping is not performed during the sensor read phase. As a result, these methods can not optimize the front-end sensor energy. Moreover, they may not be suitable for real-time applications due to the long latency of modern CV networks that are deployed in the back-end. To address this challenge, this paper presents a custom-designed reconfigurable CMOS image sensor (CIS) system that improves energy efficiency by selectively skipping uneventful regions or rows within a frame during the sensor's readout phase, and the subsequent analog-to-digital conversion (ADC) phase. A novel masking algorithm intelligently directs the skipping process in real-time, optimizing both the front-end sensor and back-end neural networks for applications including autonomous driving and augmented/virtual reality (AR/VR). Our system can also operate in standard mode without skipping, depending on application needs. We evaluate our hardware-algorithm co-design framework on object detection based on BDD100K and ImageNetVID, and gaze estimation based on OpenEDS, achieving up to 53% reduction in front-end sensor energy while maintaining state-of-the-art (SOTA) accuracy.
Paper Structure (12 sections, 4 equations, 8 figures, 2 tables)

This paper contains 12 sections, 4 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Mask prediction from input using MGN. MGN consists of a transformer block (MHSA+FFN) followed by a self-attention (MHSA) and linear layer.
  • Figure 2: (a) Reconfigurable CIS system architecture with mask generation engine and memory blocks, (b) Representative system diagram showing 3D integration of BI-CIS and digital logic chip, and (c) Modified single-slope ADC (SSADC) with power gating for energy-efficient operation.
  • Figure 3: Pixel reading operations for (a) standard mode, (b) row-skip mode, and (c) region-skip mode. Active components are highlighted in green, while light gray represents the power-gated elements involved in the reading process across the various modes.
  • Figure 4: Comparison of mIoU vs normalized front-end energy reduction with existing approaches on OpenEDS dataset
  • Figure 5: Visualization of masks on BDD100K showing (a) region mask with $t_{reg}$=0.1, (b) row mask with $t_{row}$=0.5, and (c) input masked with (a).
  • ...and 3 more figures