Table of Contents
Fetching ...

Lead Zirconate Titanate Reservoir Computing for Classification of Written and Spoken Digits

Thomas Buckley, Leslie Schumm, Manor Askenazi, Edward Rietman

Abstract

In this paper we extend our earlier work of (Rietman et al. 2022) presenting an application of physical Reservoir Computing (RC) to the classification of handwritten and spoken digits. We utilize an unpoled cube of Lead Zirconate Titanate (PZT) as a computational substrate to process these datasets. Our results demonstrate that the PZT reservoir achieves 89.0% accuracy on MNIST handwritten digits, representing a 2.4 percentage point improvement over logistic regression baselines applied to the same preprocessed data. However, for the AudioMNIST spoken digits dataset, the reservoir system (88.2% accuracy) performs equivalently to baseline methods (88.1% accuracy), suggesting that reservoir computing provides the greatest benefits for classification tasks of intermediate difficulty where linear methods underperform but the problem remains learnable. PZT is a well-known material already used in semiconductor applications, presenting a low-power computational substrate that can be integrated with digital algorithms. Our findings indicate that physical reservoirs excel when the task difficulty exceeds the capability of simple linear classifiers but remains within the computational capacity of the reservoir dynamics.

Lead Zirconate Titanate Reservoir Computing for Classification of Written and Spoken Digits

Abstract

In this paper we extend our earlier work of (Rietman et al. 2022) presenting an application of physical Reservoir Computing (RC) to the classification of handwritten and spoken digits. We utilize an unpoled cube of Lead Zirconate Titanate (PZT) as a computational substrate to process these datasets. Our results demonstrate that the PZT reservoir achieves 89.0% accuracy on MNIST handwritten digits, representing a 2.4 percentage point improvement over logistic regression baselines applied to the same preprocessed data. However, for the AudioMNIST spoken digits dataset, the reservoir system (88.2% accuracy) performs equivalently to baseline methods (88.1% accuracy), suggesting that reservoir computing provides the greatest benefits for classification tasks of intermediate difficulty where linear methods underperform but the problem remains learnable. PZT is a well-known material already used in semiconductor applications, presenting a low-power computational substrate that can be integrated with digital algorithms. Our findings indicate that physical reservoirs excel when the task difficulty exceeds the capability of simple linear classifiers but remains within the computational capacity of the reservoir dynamics.

Paper Structure

This paper contains 37 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: (A) The physical reservoir neural network uses a fixed, non-linear device to map input signals to a high-dimensional temporal output. These higher-dimensional features are then classified by a very simple model such as a logistic regression. This regression is the trainable element in the reservoir system, reducing the effective computational cost of the entire network. The overall goal of the RC framework is to generate features that are highly separable. (B) We compare the physical reservoir to using only the regression model from the readout layer. This experiment measures the relative performance increase from using the reservoir.
  • Figure 2: The cube reservoir system architecture. In (A), a Hilbert curve is generated over the $32\times32$ binary handwritten digit, converting it to a 1D binary string. In (B), Mel-frequency cepstral coefficients (MFCCs) are computed for each spoken digit to produce $32\times32$ power spectrograms. These are then binarized by taking the mean value and converting values above the mean to 1 and values below to 0. This final binarized image (an example shown in the figure) is then scanned vertically to generate the 1D binary string (since the MFCCs are already a time series). An 8-bit sliding window with a stride of 1-bit is used to generate 1024 8-bit vectors, which are passed as a batch to the signal generator. The signal generator then applies the 8 binary values from each vector in parallel to a separate pad of the cube. This is done at high speed and recorded, producing the final image of the reservoir dynamics.
  • Figure 3: Photograph of the cube system setup. The Teensy (top left) is used to generate a series of 8 parallel 3.3V signals at high speeds (30MHz). The PZT cube (left middle) is used as the reservoir. The Analog Discovery 2 (top middle) is used to capture the reservoir dynamics at high speed (sampling frequency is 100MHz).