Table of Contents
Fetching ...

Finding Strong Lottery Ticket Networks with Genetic Algorithms

Philipp Altmann, Julian Schönberger, Maximilian Zorn, Thomas Gabor

TL;DR

This work presents the first approach based on a genetic algorithm to find such strong lottery ticket sub-networks without training or otherwise computing any gradient, and shows that this evolutionary approach even produces smaller and better-performing lottery ticket networks than the state-of-the-art approach using gradient information.

Abstract

According to the Strong Lottery Ticket Hypothesis, every sufficiently large neural network with randomly initialized weights contains a sub-network which - still with its random weights - already performs as well for a given task as the trained super-network. We present the first approach based on a genetic algorithm to find such strong lottery ticket sub-networks without training or otherwise computing any gradient. We show that, for smaller instances of binary classification tasks, our evolutionary approach even produces smaller and better-performing lottery ticket networks than the state-of-the-art approach using gradient information.

Finding Strong Lottery Ticket Networks with Genetic Algorithms

TL;DR

This work presents the first approach based on a genetic algorithm to find such strong lottery ticket sub-networks without training or otherwise computing any gradient, and shows that this evolutionary approach even produces smaller and better-performing lottery ticket networks than the state-of-the-art approach using gradient information.

Abstract

According to the Strong Lottery Ticket Hypothesis, every sufficiently large neural network with randomly initialized weights contains a sub-network which - still with its random weights - already performs as well for a given task as the trained super-network. We present the first approach based on a genetic algorithm to find such strong lottery ticket sub-networks without training or otherwise computing any gradient. We show that, for smaller instances of binary classification tasks, our evolutionary approach even produces smaller and better-performing lottery ticket networks than the state-of-the-art approach using gradient information.

Paper Structure

This paper contains 24 sections, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Illustration of a lottery ticket network. Top: Full network graph. Red connections persist in most evolved lottery ticket networks in an example population (blue connections do not). Bottom: Example of an evolved lottery ticket subnetwork with only a fraction of active connections.
  • Figure 2: Overview of our datasets and network architectures we use on them: The moons (\ref{['fig:moons']}) test dataset consisting of 16000 2d-datapoints, normalized on the interval $[-0.7, 0.7]$, and the circles (\ref{['fig:circles']}) test dataset consisting of two different-sized rings with 16000 2d-datapoints from scikit-learnpedregosa2011scikit, and the network architectures (\ref{['tab:architectures']}) with a single-lettered identification code. The bracket notation describes the number of neurons in the different network layers. The first number corresponds to the number of input neurons. The last is the number of output neurons.
  • Figure 3: Overview of the performance of the GA in the moons \ref{['fig:GA_moons']} and circles \ref{['fig:GA_circles']} datasets. The blue boxes contain different runs for every architecture using the default GA configuration. The pink boxes contain the results of $R$ runs for the GA configuration that uses the adaptive accuracy bound with initial threshold value $0.85$. For comparison, we added the mean accuracies that were achieved with the trained networks using backpropagation. \ref{['tab:GA_accuracies']} summarizes the achieved accuracies.
  • Figure 4: Optimization progress of one well-performing run using "GA (adaptive BA)" on network architecture B $=[2, 75, 2]$ in the circles dataset with regard to the accuracy \ref{['fig:acc_dev']}, and the sparsity \ref{['fig:spars_dev']}. The blue line shows the sparsity of the fittest individual in the current population. The orange line displays the top sparsity in the current population, and the green line represents the current best sparsity found in all previous generations.
  • Figure 5: Illustration of the performance of edge-popup on shown datasets using the different color-coded initializations with $R$ runs each. The backpropagation mean accuracies on the respective architectures (dashed line) are provided for comparison.
  • ...and 2 more figures