Table of Contents
Fetching ...

MANTIS: A Mixed-Signal Near-Sensor Convolutional Imager SoC Using Charge-Domain 4b-Weighted 5-to-84-TOPS/W MAC Operations for Feature Extraction and Region-of-Interest Detection

Martin Lefebvre, David Bol

TL;DR

A mixed-signal convolutional imager system-on-chip (SoC) codenamed MANTIS is introduced, featuring a unique combination of large b-weighted filters, operation at multiple scales, and double sampling, well suited to the requirements of medium-complexity tasks.

Abstract

Recent advances in artificial intelligence have prompted the search for enhanced algorithms and hardware to support the deployment of machine learning at the edge. More specifically, in the context of the Internet of Things (IoT), vision chips must be able to fulfill tasks of low to medium complexity, such as feature extraction or region-of-interest (RoI) detection, with a sub-mW power budget imposed by the use of small batteries or energy harvesting. Mixed-signal vision chips relying on in- or near-sensor processing have emerged as an interesting candidate, thanks to their favorable tradeoff between energy efficiency (EE) and computational accuracy compared to digital systems for these specific tasks. In this paper, we introduce a mixed-signal convolutional imager system-on-chip (SoC) codenamed MANTIS, featuring a unique combination of large 16$\times$16 4b-weighted filters, operation at multiple scales, and double sampling, well suited to the requirements of medium-complexity tasks. The main contributions are (i) circuits called DS3 units combining delta-reset sampling, image downsampling, and voltage downshifting, and (ii) charge-domain multiply-and-accumulate (MAC) operations based on switched-capacitor amplifiers and charge sharing in the capacitive DAC of the successive-approximation ADCs. MANTIS achieves peak EEs normalized to 1b operations of 4.6 and 84.1 TOPS/W at the accelerator and SoC levels, while computing feature maps with a root mean square error ranging from 3 to 11.3$\%$. It also demonstrates a face RoI detection with a false negative rate of 11.5$\%$, while discarding 81.3$\%$ of image patches and reducing the data transmitted off chip by 13$\times$ compared to the raw image.

MANTIS: A Mixed-Signal Near-Sensor Convolutional Imager SoC Using Charge-Domain 4b-Weighted 5-to-84-TOPS/W MAC Operations for Feature Extraction and Region-of-Interest Detection

TL;DR

A mixed-signal convolutional imager system-on-chip (SoC) codenamed MANTIS is introduced, featuring a unique combination of large b-weighted filters, operation at multiple scales, and double sampling, well suited to the requirements of medium-complexity tasks.

Abstract

Recent advances in artificial intelligence have prompted the search for enhanced algorithms and hardware to support the deployment of machine learning at the edge. More specifically, in the context of the Internet of Things (IoT), vision chips must be able to fulfill tasks of low to medium complexity, such as feature extraction or region-of-interest (RoI) detection, with a sub-mW power budget imposed by the use of small batteries or energy harvesting. Mixed-signal vision chips relying on in- or near-sensor processing have emerged as an interesting candidate, thanks to their favorable tradeoff between energy efficiency (EE) and computational accuracy compared to digital systems for these specific tasks. In this paper, we introduce a mixed-signal convolutional imager system-on-chip (SoC) codenamed MANTIS, featuring a unique combination of large 1616 4b-weighted filters, operation at multiple scales, and double sampling, well suited to the requirements of medium-complexity tasks. The main contributions are (i) circuits called DS3 units combining delta-reset sampling, image downsampling, and voltage downshifting, and (ii) charge-domain multiply-and-accumulate (MAC) operations based on switched-capacitor amplifiers and charge sharing in the capacitive DAC of the successive-approximation ADCs. MANTIS achieves peak EEs normalized to 1b operations of 4.6 and 84.1 TOPS/W at the accelerator and SoC levels, while computing feature maps with a root mean square error ranging from 3 to 11.3. It also demonstrates a face RoI detection with a false negative rate of 11.5, while discarding 81.3 of image patches and reducing the data transmitted off chip by 13 compared to the raw image.

Paper Structure

This paper contains 19 sections, 6 equations, 23 figures, 2 tables.

Figures (23)

  • Figure 1: (a) Vision chip architectures ranging from mixed-signal processing in or near the pixel array to conventional digital processing outside of it, and (b) strengths and limitations of these architectures. (c) Envisioned system based on a cascaded processing scheme similar to Kim_2017, in which only relevant image patches are transmitted from the image sensor to the digital processor.
  • Figure 2: MANTIS CMOS imager SoC (a) modes of operation and (b) architecture, detailing the different blocks in the digital core and image sensor analog core with their respective power domains.
  • Figure 3: Block diagram of (a) the convolution and (b) the imaging pipelines.
  • Figure 4: (a) Schematic, (b) timing diagram, and (c) 90$^\circ$-rotated layout of a single column-parallel DS3 unit. $V_\mathrm{CM}$ = 1.2 V and $V_\mathrm{REF}$ = 0.6 V in (a).
  • Figure 5: Schematics of (a) the inverter-based OTA proposed in Gönen_2017, and of (b) the enable circuit shared by all 128 column-parallel DS3 units.
  • ...and 18 more figures