Table of Contents
Fetching ...

Sparsely Activated Networks

Paschalis Bizopoulos, Dimitrios Koutsouris

TL;DR

The paper introduces the φ metric to quantify the trade-off between reconstruction accuracy and representation compression in unsupervised learning, and proposes Sparsely Activated Networks (SANs) that use shared-weight kernels and spike-like sparse activations. Five activation functions, including Extrema and Extrema-Pool indices, are evaluated to encourage interpretable, sparse representations. Across Physionet, UCI-epilepsy, MNIST, and Fashion-MNIST, SANs selected by φ yield compact, interpretable kernels that retain or improve downstream classification performance. The work demonstrates that controlling description length can yield robust, interpretable components and suggests SAMs as a practical tool for feature extraction and time-series analysis with potential for broader applications.

Abstract

Previous literature on unsupervised learning focused on designing structural priors with the aim of learning meaningful features. However, this was done without considering the description length of the learned representations which is a direct and unbiased measure of the model complexity. In this paper, first we introduce the $\varphi$ metric that evaluates unsupervised models based on their reconstruction accuracy and the degree of compression of their internal representations. We then present and define two activation functions (Identity, ReLU) as base of reference and three sparse activation functions (top-k absolutes, Extrema-Pool indices, Extrema) as candidate structures that minimize the previously defined $\varphi$. We lastly present Sparsely Activated Networks (SANs) that consist of kernels with shared weights that, during encoding, are convolved with the input and then passed through a sparse activation function. During decoding, the same weights are convolved with the sparse activation map and subsequently the partial reconstructions from each weight are summed to reconstruct the input. We compare SANs using the five previously defined activation functions on a variety of datasets (Physionet, UCI-epilepsy, MNIST, FMNIST) and show that models that are selected using $\varphi$ have small description representation length and consist of interpretable kernels.

Sparsely Activated Networks

TL;DR

The paper introduces the φ metric to quantify the trade-off between reconstruction accuracy and representation compression in unsupervised learning, and proposes Sparsely Activated Networks (SANs) that use shared-weight kernels and spike-like sparse activations. Five activation functions, including Extrema and Extrema-Pool indices, are evaluated to encourage interpretable, sparse representations. Across Physionet, UCI-epilepsy, MNIST, and Fashion-MNIST, SANs selected by φ yield compact, interpretable kernels that retain or improve downstream classification performance. The work demonstrates that controlling description length can yield robust, interpretable components and suggests SAMs as a practical tool for feature extraction and time-series analysis with potential for broader applications.

Abstract

Previous literature on unsupervised learning focused on designing structural priors with the aim of learning meaningful features. However, this was done without considering the description length of the learned representations which is a direct and unbiased measure of the model complexity. In this paper, first we introduce the metric that evaluates unsupervised models based on their reconstruction accuracy and the degree of compression of their internal representations. We then present and define two activation functions (Identity, ReLU) as base of reference and three sparse activation functions (top-k absolutes, Extrema-Pool indices, Extrema) as candidate structures that minimize the previously defined . We lastly present Sparsely Activated Networks (SANs) that consist of kernels with shared weights that, during encoding, are convolved with the input and then passed through a sparse activation function. During decoding, the same weights are convolved with the sparse activation map and subsequently the partial reconstructions from each weight are summed to reconstruct the input. We compare SANs using the five previously defined activation functions on a variety of datasets (Physionet, UCI-epilepsy, MNIST, FMNIST) and show that models that are selected using have small description representation length and consist of interpretable kernels.

Paper Structure

This paper contains 25 sections, 17 equations, 5 figures, 4 tables, 4 algorithms.

Figures (5)

  • Figure 1: Visualization of the activation maps of five activation functions (Identity, ReLU, top-k absolutes, Extrema-Pool indices and Extrema) for 1D and 2D input in the top and bottom row respectively. The 1D input to the activation functions is denoted with the continuous transparent green line using an example from the UCI dataset. The output of each activation function is denoted with the cyan stem lines with blue markers. The 2D example depicts only the output of the activation functions using an example from the MNIST dataset.
  • Figure 2: Diagrams of the feed-forward pass of an 1D and 2D SAN with two kernels for random examples from the test dataset of UCI epilepsy database and MNIST respectively. The figures depict intermediate representations; $\bm{x}$ denotes the input signal (blue line), $\bm{w}^{(i)}$ denotes the kernels (red line), $\bm{s}^{(i)}$ denotes the similarity matrices (green line), $\bm{\alpha}^{(i)}$ denotes the activation maps (cyan stem lines with blue markers), $\bm{r}^{(i)}$ denotes the partial reconstruction from each $\bm{w}^{(i)}$ and $\hat{\bm{x}}$ denotes the reconstructed input (red line). Placed for comparison, the transparent green line in $\bm{\alpha}^{(i)}$ denotes the corresponding $\bm{s}^{(i)}$ and the transparent blue line in $\hat{\bm{x}}$ denotes the input $\bm{x}$. The exponent $i=0,1$ corresponds to the first and second kernel and the intermediate representations respectively. The circles denote operations; $\mathcal{L}$ denotes the loss function, $\phi$ denotes the sparse activation function, $\ast$ the convolution operation and $+$ the plus operation. All operations are performed separate for each $\bm{w}^{(i)}$ however for visual clarity we only depict one operation for each step. Shades of red and blue in the 2D example represent positive and negative values respectively. The Extrema activation function was used for both examples.
  • Figure 3: Inverse compression ratio ($CR^{-1}$) vs. normalized reconstruction loss ($\tilde{\mathcal{L}}$) for the $15$ datasets of Physionet for various kernel sizes. The five inner plots with the yellow background on the right of each subplot, depict the corresponding kernel for the kernel size that achieved the best $\bar{\varphi}$.
  • Figure 4: Aggregated results of the evaluation of the Physionet databases using the $\bar{\varphi}$ metric. The density plot was created using kernel density estimation with Gaussian kernels and the confidence intervals denote one standard deviation.
  • Figure 5: Visualization of the learned kernels for each sparse activation function (row) and for each Physionet database (column).