Table of Contents
Fetching ...

Automated and Holistic Co-design of Neural Networks and ASICs for Enabling In-Pixel Intelligence

Shubha R. Kharel, Prashansa Mukim, Piotr Maj, Grzegorz W. Deptuch, Shinjae Yoo, Yihui Ren, Soumyajit Mandal

TL;DR

The paper tackles designing real-time edge AI within readout ASICs by introducing a fully automated, open-source co-design pipeline that jointly optimizes neural network architectures, per-layer quantization, and ASIC synthesis strategies using multi-objective Bayesian optimization. Unlike theory-only approaches, it incorporates circuit-level metrics from ASIC synthesis to guide optimization, yielding Pareto-optimal designs that balance accuracy, area, power, and delay for in-pixel intelligence. On radiation detector waveform processing, synthesis-guided searches produce more implementable and efficient designs than theory-guided methods, demonstrated by a constrained set of 54 in-pixel solutions. The authors deliver an end-to-end toolchain (Optuna, QKeras, OpenLANE, Docker) and release data to enable large-scale, hardware-aware co-design in extreme-edge AI contexts.

Abstract

Extreme edge-AI systems, such as those in readout ASICs for radiation detection, must operate under stringent hardware constraints such as micron-level dimensions, sub-milliwatt power, and nanosecond-scale speed while providing clear accuracy advantages over traditional architectures. Finding ideal solutions means identifying optimal AI and ASIC design choices from a design space that has explosively expanded during the merger of these domains, creating non-trivial couplings which together act upon a small set of solutions as constraints tighten. It is impractical, if not impossible, to manually determine ideal choices among possibilities that easily exceed billions even in small-size problems. Existing methods to bridge this gap have leveraged theoretical understanding of hardware to f architecture search. However, the assumptions made in computing such theoretical metrics are too idealized to provide sufficient guidance during the difficult search for a practical implementation. Meanwhile, theoretical estimates for many other crucial metrics (like delay) do not even exist and are similarly variable, dependent on parameters of the process design kit (PDK). To address these challenges, we present a study that employs intelligent search using multi-objective Bayesian optimization, integrating both neural network search and ASIC synthesis in the loop. This approach provides reliable feedback on the collective impact of all cross-domain design choices. We showcase the effectiveness of our approach by finding several Pareto-optimal design choices for effective and efficient neural networks that perform real-time feature extraction from input pulses within the individual pixels of a readout ASIC.

Automated and Holistic Co-design of Neural Networks and ASICs for Enabling In-Pixel Intelligence

TL;DR

The paper tackles designing real-time edge AI within readout ASICs by introducing a fully automated, open-source co-design pipeline that jointly optimizes neural network architectures, per-layer quantization, and ASIC synthesis strategies using multi-objective Bayesian optimization. Unlike theory-only approaches, it incorporates circuit-level metrics from ASIC synthesis to guide optimization, yielding Pareto-optimal designs that balance accuracy, area, power, and delay for in-pixel intelligence. On radiation detector waveform processing, synthesis-guided searches produce more implementable and efficient designs than theory-guided methods, demonstrated by a constrained set of 54 in-pixel solutions. The authors deliver an end-to-end toolchain (Optuna, QKeras, OpenLANE, Docker) and release data to enable large-scale, hardware-aware co-design in extreme-edge AI contexts.

Abstract

Extreme edge-AI systems, such as those in readout ASICs for radiation detection, must operate under stringent hardware constraints such as micron-level dimensions, sub-milliwatt power, and nanosecond-scale speed while providing clear accuracy advantages over traditional architectures. Finding ideal solutions means identifying optimal AI and ASIC design choices from a design space that has explosively expanded during the merger of these domains, creating non-trivial couplings which together act upon a small set of solutions as constraints tighten. It is impractical, if not impossible, to manually determine ideal choices among possibilities that easily exceed billions even in small-size problems. Existing methods to bridge this gap have leveraged theoretical understanding of hardware to f architecture search. However, the assumptions made in computing such theoretical metrics are too idealized to provide sufficient guidance during the difficult search for a practical implementation. Meanwhile, theoretical estimates for many other crucial metrics (like delay) do not even exist and are similarly variable, dependent on parameters of the process design kit (PDK). To address these challenges, we present a study that employs intelligent search using multi-objective Bayesian optimization, integrating both neural network search and ASIC synthesis in the loop. This approach provides reliable feedback on the collective impact of all cross-domain design choices. We showcase the effectiveness of our approach by finding several Pareto-optimal design choices for effective and efficient neural networks that perform real-time feature extraction from input pulses within the individual pixels of a readout ASIC.
Paper Structure (21 sections, 7 equations, 17 figures, 2 tables)

This paper contains 21 sections, 7 equations, 17 figures, 2 tables.

Figures (17)

  • Figure 1: Overview of methodology.
  • Figure 2: Reference shape, $p(t)$, of CR-(RC)$^N$ pulses for different values of $N$.
  • Figure 3: View of the 98,583 waveforms used in the experiment. Surface plot is made after sorting the waveform by amplitude along along the data index axis.
  • Figure 4: Evolution of Pareto Front during optimization. Color is based on iteration point where the Pareto belongs, in terms of percentage of total iterations. The size of the points is proportional to the area.
  • Figure 5: 2D cross-sections of 3D Pareto front from Fig. \ref{['fig:pareto_3d']} showing the relationship between validation loss and delay.
  • ...and 12 more figures