Table of Contents
Fetching ...

TimeFloats: Train-in-Memory with Time-Domain Floating-Point Scalar Products

Maeesha Binte Hashem, Benjamin Parpillon, Divake Kumar, Dinithi Jayasuria, Amit Ranjan Trivedi

TL;DR

An efficient train-in-memory architecture that performs 8-bit floating-point scalar product operations in the time domain, thus facilitating DNN training within the same memory structures and enabling high-resolution computations and easier integration with conventional digital circuits.

Abstract

In this work, we propose "TimeFloats," an efficient train-in-memory architecture that performs 8-bit floating-point scalar product operations in the time domain. While building on the compute-in-memory paradigm's integrated storage and inferential computations, TimeFloats additionally enables floating-point computations, thus facilitating DNN training within the same memory structures. Traditional compute-in-memory approaches with conventional ADCs and DACs face challenges such as higher power consumption and increased design complexity, especially at advanced CMOS nodes. In contrast, TimeFloats leverages time-domain signal processing to avoid conventional domain converters. It operates predominantly with digital building blocks, reducing power consumption and noise sensitivity while enabling high-resolution computations and easier integration with conventional digital circuits. Our simulation results demonstrate an energy efficiency of 22.1 TOPS/W while evaluating the design on 15 nm CMOS technology.

TimeFloats: Train-in-Memory with Time-Domain Floating-Point Scalar Products

TL;DR

An efficient train-in-memory architecture that performs 8-bit floating-point scalar product operations in the time domain, thus facilitating DNN training within the same memory structures and enabling high-resolution computations and easier integration with conventional digital circuits.

Abstract

In this work, we propose "TimeFloats," an efficient train-in-memory architecture that performs 8-bit floating-point scalar product operations in the time domain. While building on the compute-in-memory paradigm's integrated storage and inferential computations, TimeFloats additionally enables floating-point computations, thus facilitating DNN training within the same memory structures. Traditional compute-in-memory approaches with conventional ADCs and DACs face challenges such as higher power consumption and increased design complexity, especially at advanced CMOS nodes. In contrast, TimeFloats leverages time-domain signal processing to avoid conventional domain converters. It operates predominantly with digital building blocks, reducing power consumption and noise sensitivity while enabling high-resolution computations and easier integration with conventional digital circuits. Our simulation results demonstrate an energy efficiency of 22.1 TOPS/W while evaluating the design on 15 nm CMOS technology.
Paper Structure (11 sections, 7 figures, 2 tables)

This paper contains 11 sections, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Edge Learning with Train-in-Memory: Current DNN training relies on the cloud due to the data-intensive nature of the training procedures. In contrast, this work presents train-in-memory using time-domain computing capable of performing efficient training directly within memory.
  • Figure 2: Overview of TimeFloats: TimeFloats utilizes in-memory time-domain processing of floating-point weights and inputs in five steps, as marked in the figure. Step-1: Input and weight exponents are added element-wise in a vector-parallel operation using time-domain analog computations. Step-2: Exponent sums are normalized across the vector to improve the bit-range of processing in successive steps. Step-3: Input mantissa bits are scaled based on the normalizing factor using digital shift operations. Step-4: Time-domain multiply-accumulate (MAC) operations are performed between the mantissas of the weights and the inputs. Step-5: The net output is digitized and normalized to floating point format.
  • Figure 3: Exponent Adder:(a) We use the RC path discharge for element-wise exponent summation and time-domain conversion of the output for subsequent processing. (b) Linearity of exponent summations at varying input/weight mantissa bit combinations.
  • Figure 4: Largest Exponent Detector: After element-wise exponent additions, the largest summed exponent is identified for normalization using a tree-like structure of D Flip-Flops and 2:1 multiplexers. The argument of largest sum is determined using select pin outputs.
  • Figure 5: Exponent Normalization: After identifying the largest exponent, it is subtracted from the other exponent terms in time-domain itself. Based on the resultant, other mantissa terms are scaled. THe scaled mantissa is then converted to time pulse for subsequent processing.
  • ...and 2 more figures