Table of Contents
Fetching ...

Chimera: A Block-Based Neural Architecture Search Framework for Event-Based Object Detection

Diego A. Silva, Ahmed Elsheikh, Kamilya Smagulova, Mohammed E. Fouda, Ahmed M. Eltawil

TL;DR

Chimera introduces a two-stage Zero-Shot NAS framework to automatically design heterogeneous event-based detectors by combining blocks from CNNs, transformers, and SSMs within a recurrent backbone. The design space is explored with proxies such as Zen-Score and MACs, plus a Diversity Index to encourage block variety, and SHIST is identified as the most effective encoding for PEDRo. The resulting architectures achieve competitive performance with significantly reduced parameter counts (average 1.6x reduction) and demonstrate strong results on PEDRo, with Chimera-3M matching ReYOLOv8n at 1.5x fewer parameters and Chimera-5M surpassing it. The work provides a practical framework for co-designing event-encodings and hybrid backbones, with potential for generalization to GEN1 and larger datasets, enabling more energy-efficient and scalable event-based detection systems.

Abstract

Event-based cameras are sensors that simulate the human eye, offering advantages such as high-speed robustness and low power consumption. Established Deep Learning techniques have shown effectiveness in processing event data. Chimera is a Block-Based Neural Architecture Search (NAS) framework specifically designed for Event-Based Object Detection, aiming to create a systematic approach for adapting RGB-domain processing methods to the event domain. The Chimera design space is constructed from various macroblocks, including Attention blocks, Convolutions, State Space Models, and MLP-mixer-based architectures, which provide a valuable trade-off between local and global processing capabilities, as well as varying levels of complexity. The results on the PErson Detection in Robotics (PEDRo) dataset demonstrated performance levels comparable to leading state-of-the-art models, alongside an average parameter reduction of 1.6 times.

Chimera: A Block-Based Neural Architecture Search Framework for Event-Based Object Detection

TL;DR

Chimera introduces a two-stage Zero-Shot NAS framework to automatically design heterogeneous event-based detectors by combining blocks from CNNs, transformers, and SSMs within a recurrent backbone. The design space is explored with proxies such as Zen-Score and MACs, plus a Diversity Index to encourage block variety, and SHIST is identified as the most effective encoding for PEDRo. The resulting architectures achieve competitive performance with significantly reduced parameter counts (average 1.6x reduction) and demonstrate strong results on PEDRo, with Chimera-3M matching ReYOLOv8n at 1.5x fewer parameters and Chimera-5M surpassing it. The work provides a practical framework for co-designing event-encodings and hybrid backbones, with potential for generalization to GEN1 and larger datasets, enabling more energy-efficient and scalable event-based detection systems.

Abstract

Event-based cameras are sensors that simulate the human eye, offering advantages such as high-speed robustness and low power consumption. Established Deep Learning techniques have shown effectiveness in processing event data. Chimera is a Block-Based Neural Architecture Search (NAS) framework specifically designed for Event-Based Object Detection, aiming to create a systematic approach for adapting RGB-domain processing methods to the event domain. The Chimera design space is constructed from various macroblocks, including Attention blocks, Convolutions, State Space Models, and MLP-mixer-based architectures, which provide a valuable trade-off between local and global processing capabilities, as well as varying levels of complexity. The results on the PErson Detection in Robotics (PEDRo) dataset demonstrated performance levels comparable to leading state-of-the-art models, alongside an average parameter reduction of 1.6 times.
Paper Structure (42 sections, 8 equations, 14 figures, 12 tables, 1 algorithm)

This paper contains 42 sections, 8 equations, 14 figures, 12 tables, 1 algorithm.

Figures (14)

  • Figure 1: Structure of the Chimera Network.
  • Figure 2: a) Kendall's Correlation between the different proxies and data formats, for the full heterogeneous architectures.; b) Distribution of the mAPs for the different data encodings analyzed in the benchmark.
  • Figure 3: Comparison between the results and the state-of-the-art for the dataset PEDRo.
  • Figure 4: Structure of a C2f block.
  • Figure 5: MaxViT block structure maxvit.
  • ...and 9 more figures