Table of Contents
Fetching ...

Maximum-Entropy Analog Computing Approaching ExaOPS-per-Watt Energy-efficiency at the RF-Edge

Aswin Undavalli, Kareem Rashed, Zhili Xiao, Arun Natarajan, Shantanu Chakrabartty, Aravind Nagulu

TL;DR

The paper proposes a maximum-entropy analog computing framework, realized as Margin Propagation (MP), to compute correlations and inner products in a mesoscopic, symmetry-bound physical ensemble. By treating the entire substrate, including interconnects and parasitics, as a collective computing unit, the authors demonstrate far-from-equilibrium MP dynamics that encode $G(\mathbf{x}^T\mathbf{y})$, achieving high energy efficiency on RF-edge tasks. A 22 nm SOI CMOS prototype with 256 and 1024-length correlators validates non-equilibrium MP operation, attaining over $2\text{ PetaOPS/W}$ at 8-bit precision and over $0.8\text{ ExaOPS/W}$ at 3-bit precision, while performing spectrum sensing and code-domain communications at multi-GS/s rates. The work highlights robustness to PVT variations, scalability considerations, and potential extensions to optical and other domains, signaling a new paradigm for ultra-low-power, high-throughput analog signal processing at the RF edge.

Abstract

In this paper, we demonstrate how the physics of entropy production, when combined with symmetry constraints, can be used for implementing high-performance and energy-efficient analog computing systems. At the core of the proposed framework is a generalized maximum-entropy principle that can describe the evolution of a mesoscopic physical system formed by an interconnected ensemble of analog elements, including devices that can be readily fabricated on standard integrated circuit technology. We show that the maximum-entropy state of this ensemble corresponds to a margin-propagation (MP) distribution and can be used for computing correlations and inner products as the ensemble's macroscopic properties. Furthermore, the limits of computational throughput and energy efficiency can be pushed by extending the framework to non-equilibrium or transient operating conditions, which we demonstrate using a proof-of-concept radio-frequency (RF) correlator integrated circuit fabricated in a 22 nm SOI CMOS process. The measured results show a compute efficiency greater than 2 Peta ($10^{15}$) Bit Operations per second per Watt (PetaOPS/W) at 8-bit precision and greater than 0.8 Exa ($10^{18}$) Bit Operations per second per Watt (ExaOPS/W) at 3-bit precision for RF data sampled at rates greater than 4 GS/s. Using the fabricated prototypes, we also showcase several real-world RF applications at the edge, including spectrum sensing, and code-domain communications.

Maximum-Entropy Analog Computing Approaching ExaOPS-per-Watt Energy-efficiency at the RF-Edge

TL;DR

The paper proposes a maximum-entropy analog computing framework, realized as Margin Propagation (MP), to compute correlations and inner products in a mesoscopic, symmetry-bound physical ensemble. By treating the entire substrate, including interconnects and parasitics, as a collective computing unit, the authors demonstrate far-from-equilibrium MP dynamics that encode , achieving high energy efficiency on RF-edge tasks. A 22 nm SOI CMOS prototype with 256 and 1024-length correlators validates non-equilibrium MP operation, attaining over at 8-bit precision and over at 3-bit precision, while performing spectrum sensing and code-domain communications at multi-GS/s rates. The work highlights robustness to PVT variations, scalability considerations, and potential extensions to optical and other domains, signaling a new paradigm for ultra-low-power, high-throughput analog signal processing at the RF edge.

Abstract

In this paper, we demonstrate how the physics of entropy production, when combined with symmetry constraints, can be used for implementing high-performance and energy-efficient analog computing systems. At the core of the proposed framework is a generalized maximum-entropy principle that can describe the evolution of a mesoscopic physical system formed by an interconnected ensemble of analog elements, including devices that can be readily fabricated on standard integrated circuit technology. We show that the maximum-entropy state of this ensemble corresponds to a margin-propagation (MP) distribution and can be used for computing correlations and inner products as the ensemble's macroscopic properties. Furthermore, the limits of computational throughput and energy efficiency can be pushed by extending the framework to non-equilibrium or transient operating conditions, which we demonstrate using a proof-of-concept radio-frequency (RF) correlator integrated circuit fabricated in a 22 nm SOI CMOS process. The measured results show a compute efficiency greater than 2 Peta () Bit Operations per second per Watt (PetaOPS/W) at 8-bit precision and greater than 0.8 Exa () Bit Operations per second per Watt (ExaOPS/W) at 3-bit precision for RF data sampled at rates greater than 4 GS/s. Using the fabricated prototypes, we also showcase several real-world RF applications at the edge, including spectrum sensing, and code-domain communications.

Paper Structure

This paper contains 26 sections, 2 theorems, 45 equations, 13 figures.

Key Result

Lemma 2.1

For a given constraint $\gamma$ and input vector $O$, the output of MP function $z = \phi(O)$ is unique.

Figures (13)

  • Figure 1: (a) Conventional analog computing architecture for computing inner-products and correlations using multiply-accumulate (MAC) operations. (b) Maximum-entropy analog computing architecture and an integrated circuit prototype for demonstrating the proof-of-concept. Inputs ${\bf x},{\bf y}$ are presented as boundary conditions of a statistical ensemble and the ensemble potential encodes the inner-product/correlation as $G({\bf x}^T{\bf y})$. (c) Rectangular symmetry constraints on the potential wells with respective heights ($E_1^{\pm} = E_0 \pm x_1 \pm y_1$, $E_2^{\pm} = E_0 \pm x_2 \pm y_2$, $E_3^{\pm} = E_0 \pm x_1 \mp y_1$, $E_4^{\pm} = E_0 \pm x_2 \mp y_2$). (d) Mesostates and microstates corresponding to each potential well and part of a larger ensemble (e). (f) Conceptual illustrations of a system evolving to its maximum-entropy state with under energy and symmetry constraints. The distribution of microstates within the energy wells, illustrated as red and green spheres on the heat map in (d) and denoted by $p_i^{\pm}$ for $E_1^+, E_2^+, E_3^+, E_4^+$ and $q_i^{\pm}$ for $E_1^-, E_2^-, E_3^-, E_4^-$ , will be determined by the ensemble potentials $z^+$ and $z^-$, shown by the red and green planes. (g) Simulated dynamics for $h(x) = \text{max}(0,x)$ showing the expected difference between $z^+$ and $z^-$ as a function of time and across different correlation values. (h) The expected difference between $z^+$ and $z^-$ at different time steps, as the correlation is varied. (i) Comparison of the signal processing gains, SPG = 1/$\text{(rms error)}^2$ with respect to the ensemble size, for MAC-based correlators and above MP maximum entropy approach at different time steps for $t = 2, 10, 80$, where $h(x) = \text{max}(0,x)$.
  • Figure 2: Non-equilibrium MP-based analog computing realized in a Standard CMOS Process. (a) Layout and cross-sectional view of a 4-transistor MP-unit cell arranged in a centroid configuration. (b) Array of the MP-unit cells with interconnects and sampling capacitors. (c) Equivalent circuit model of the array comprising of dual-gate NMOS transistors enforcing rectangular symmetry constraints between the operands, and readout equivalent circuit. Die micrograph of two MP-ensembles fabricated in a 22 nm SOI CMOS process and functioning as (d) N=1024 length correlator and (e) N=256 length correlator. (f) Architecture showing samplers and clock generation modules interfacing with the correlator core in (d) and (e).
  • Figure 3: (a) Timing diagram illustrating MP correlator's operation, where sampling phase is followed by a phase to compute correlations. Correlation can also be estimated during the sampling phase, though with higher error due to a smaller number of sampled values. (b) Measured correlation error versus varying correlation between two random input vectors. The RMS error is –29.5 dB, closely approaching the theoretical limit of $1/\sqrt{1024}$. (c) Measured correlation and corresponding error for periodic input signals. The use of deterministic inputs removes statistical uncertainty, isolating hardware-induced errors and resulting in an RMS error of –49 dB—comparable to an 8-bit digital multiplier. (d) Transient correlation response during the compute phase as the correlation between the two inputs is varied from -1 to +1. (e) Output of the MP-correlator across varying compute time instances. (f) Predicted correlation ($R_{MP}$) versus true correlation ($R_{\infty}$) across compute time after the compute-time dependent one-to-one mapping function. (g) Estimated POPS/W and Effective Number of Bits (ENOB) across varying compute time, highlighting the energy efficiency and precision trade-off.
  • Figure 4: Demonstration and evaluation of RF correlator IC in spectrum sensing and code-domain communication tasks. (a) Experimental setup for spectrum sensing using the 1024-length single-lag correlator. (b) Timing diagram illustrating the operation of the spectrum sensing system. (c) Measured correlation output demonstrating detection of input signal frequency. (d) Measured output revealing the occupied bandwidth of the input signal. (e) Compressive sensing-based reconstruction: comparison between the original signal, reconstructed waveform, and its frequency spectrum. (f) Experimental setup for code-domain communication using pseudo-random spreading codes. (g) Measured correlation output demonstrating selectivity to matched spreading codes. (h) System performance in a 64-APSK (Amplitude and Phase Shift Keying) modulation scheme, achieving an error vector magnitude (EVM) of –27 dB which translates to $3.5\times10^{-6}$ BER.
  • Figure 5: Performance of Direct RF-Sampling MP Analog Correlator. (a) Pie chart of energy consumption per frame at different sampling rates for the 256-length with compute time of 20ns, 2800Ω of $R_\text{sink}$, 0.5V supply and the 1024-length MP correlator with compute time of 20ns, 400Ω of $R_\text{sink}$, 0.8V supply to compute core. Both samplers have 0.8V power supply. (b) Comparison with other inner-product and correlator hardware platforms. The proposed dynamic MP approach results in superior compute core efficiency, exceeding 800 POPS/W in 3-bit precision and 2900 TOPS/W in 8-bit precision for length 256 at 4GS/s, and 2885 TOPS/W compute core efficiency, 370 TOPS/W at the system level in 8-bit precision for length 1024, while directly sampling the RF data at 10GS/s.
  • ...and 8 more figures

Theorems & Definitions (6)

  • Lemma 2.1
  • proof
  • proof
  • Definition 3.1
  • Theorem 3.1
  • proof