Table of Contents
Fetching ...

Near-Memory Architecture for Threshold-Ordinal Surface-Based Corner Detection of Event Cameras

Hongyang Shang, An Guo, Shuai Dong, Junyi Yang, Ye Ke, Arindam Basu

TL;DR

This work tackles the latency and energy bottlenecks of TOS-based corner detection on high-rate event cameras by introducing a near-memory computing architecture (NM-TOS). The design uses read-write decoupled 8T SRAM, a pipelined patch-update flow, and dynamic voltage/frequency scaling to accelerate per-event TOS updates while reducing power. Hardware modules (MO/CMP) and a DVFS-enabled peripheral ecosystem yield up to 24.7× latency reductions and up to 6.6× energy savings, with only minor AUC degradation on standard EBC datasets under low-voltage operation. The approach enables real-time, edge-hosted corner detection for surveillance, robotics, and autonomous systems with high-resolution event streams, illustrating robust performance under hardware non-idealities.

Abstract

Event-based Cameras (EBCs) are widely utilized in surveillance and autonomous driving applications due to their high speed and low power consumption. Corners are essential low-level features in event-driven computer vision, and novel algorithms utilizing event-based representations, such as Threshold-Ordinal Surface (TOS), have been developed for corner detection. However, the implementation of these algorithms on resource-constrained edge devices is hindered by significant latency, undermining the advantages of EBCs. To address this challenge, a near-memory architecture for efficient TOS updates (NM-TOS) is proposed. This architecture employs a read-write decoupled 8T SRAM cell and optimizes patch update speed through pipelining. Hardware-software co-optimized peripheral circuits and dynamic voltage and frequency scaling (DVFS) enable power and latency reductions. Compared to traditional digital implementations, our architecture reduces latency/energy by 24.7x/1.2x at Vdd = 1.2 V or 1.93x/6.6x at Vdd = 0.6 V based on 65nm CMOS process. Monte Carlo simulations confirm robust circuit operation, demonstrating zero bit error rate at operating voltages above 0.62 V, with only 0.2% at 0.61 V and 2.5% at 0.6 V. Corner detection evaluation using precision-recall area under curve (AUC) metrics reveals minor AUC reductions of 0.027 and 0.015 at 0.6 V for two popular EBC datasets.

Near-Memory Architecture for Threshold-Ordinal Surface-Based Corner Detection of Event Cameras

TL;DR

This work tackles the latency and energy bottlenecks of TOS-based corner detection on high-rate event cameras by introducing a near-memory computing architecture (NM-TOS). The design uses read-write decoupled 8T SRAM, a pipelined patch-update flow, and dynamic voltage/frequency scaling to accelerate per-event TOS updates while reducing power. Hardware modules (MO/CMP) and a DVFS-enabled peripheral ecosystem yield up to 24.7× latency reductions and up to 6.6× energy savings, with only minor AUC degradation on standard EBC datasets under low-voltage operation. The approach enables real-time, edge-hosted corner detection for surveillance, robotics, and autonomous systems with high-resolution event streams, illustrating robust performance under hardware non-idealities.

Abstract

Event-based Cameras (EBCs) are widely utilized in surveillance and autonomous driving applications due to their high speed and low power consumption. Corners are essential low-level features in event-driven computer vision, and novel algorithms utilizing event-based representations, such as Threshold-Ordinal Surface (TOS), have been developed for corner detection. However, the implementation of these algorithms on resource-constrained edge devices is hindered by significant latency, undermining the advantages of EBCs. To address this challenge, a near-memory architecture for efficient TOS updates (NM-TOS) is proposed. This architecture employs a read-write decoupled 8T SRAM cell and optimizes patch update speed through pipelining. Hardware-software co-optimized peripheral circuits and dynamic voltage and frequency scaling (DVFS) enable power and latency reductions. Compared to traditional digital implementations, our architecture reduces latency/energy by 24.7x/1.2x at Vdd = 1.2 V or 1.93x/6.6x at Vdd = 0.6 V based on 65nm CMOS process. Monte Carlo simulations confirm robust circuit operation, demonstrating zero bit error rate at operating voltages above 0.62 V, with only 0.2% at 0.61 V and 2.5% at 0.6 V. Corner detection evaluation using precision-recall area under curve (AUC) metrics reveals minor AUC reductions of 0.027 and 0.015 at 0.6 V for two popular EBC datasets.

Paper Structure

This paper contains 18 sections, 11 figures, 1 table, 1 algorithm.

Figures (11)

  • Figure 1: (a) Architecture of luvHarris where the TOS is updated for each incoming event, $v_{in}$. The Harris lookup table (LUT) is updated for the full frame by accessing the TOS and $v_{in}$ is tagged as a corner $c$ or not by referencing the last available Harris LUT.(b) The maximum throughput of eHarris, the conventional implementation of luvHarris, and the proposed NMC-TOS compared with the maximum bandwidth of the DAVIS240gallego2020event.
  • Figure 2: The workflow for corner detection on the event stream of an EBC. The events received by the EBC first pass through an STCF filter to remove noise. Then, the TOS is constructed by NMC-TOS EBE, while the event frequency is detected using the DVFS module. Finally, corner detection is performed FBF.
  • Figure 3: The overall block architecture of NMC-TOS is divided; an EBC like DAVIS240 with resolution $240\times180$ requires two such blocks. Each block of the TOS array consists of 180 rows and 120 columns of 5-bit words. The peripheral circuits include MO module, CMP module, WR module, buffer, and control circuits.
  • Figure 4: 8T SRAM-type A and the pipeline method. (a) Decoupling the WBL and RBL makes it possible for the write-back and read operations to occur simultaneously. (b) A pipeline example for updating a 7$\times$7 patch.
  • Figure 5: Minus one (MO) module. (a) MO module SA for readout of SRAM and simplified minus one logic (MOL). (b) MOL reduces path delay compared to 28T full adders (FA). (c) Truth table of MOL.
  • ...and 6 more figures