Chimera: A Block-Based Neural Architecture Search Framework for Event-Based Object Detection
Diego A. Silva, Ahmed Elsheikh, Kamilya Smagulova, Mohammed E. Fouda, Ahmed M. Eltawil
TL;DR
Chimera introduces a two-stage Zero-Shot NAS framework to automatically design heterogeneous event-based detectors by combining blocks from CNNs, transformers, and SSMs within a recurrent backbone. The design space is explored with proxies such as Zen-Score and MACs, plus a Diversity Index to encourage block variety, and SHIST is identified as the most effective encoding for PEDRo. The resulting architectures achieve competitive performance with significantly reduced parameter counts (average 1.6x reduction) and demonstrate strong results on PEDRo, with Chimera-3M matching ReYOLOv8n at 1.5x fewer parameters and Chimera-5M surpassing it. The work provides a practical framework for co-designing event-encodings and hybrid backbones, with potential for generalization to GEN1 and larger datasets, enabling more energy-efficient and scalable event-based detection systems.
Abstract
Event-based cameras are sensors that simulate the human eye, offering advantages such as high-speed robustness and low power consumption. Established Deep Learning techniques have shown effectiveness in processing event data. Chimera is a Block-Based Neural Architecture Search (NAS) framework specifically designed for Event-Based Object Detection, aiming to create a systematic approach for adapting RGB-domain processing methods to the event domain. The Chimera design space is constructed from various macroblocks, including Attention blocks, Convolutions, State Space Models, and MLP-mixer-based architectures, which provide a valuable trade-off between local and global processing capabilities, as well as varying levels of complexity. The results on the PErson Detection in Robotics (PEDRo) dataset demonstrated performance levels comparable to leading state-of-the-art models, alongside an average parameter reduction of 1.6 times.
