Table of Contents
Fetching ...

PReLU: Yet Another Single-Layer Solution to the XOR Problem

Rafael C. Pinto, Anderson R. Tavares

Abstract

This paper demonstrates that a single-layer neural network using Parametric Rectified Linear Unit (PReLU) activation can solve the XOR problem, a simple fact that has been overlooked so far. We compare this solution to the multi-layer perceptron (MLP) and the Growing Cosine Unit (GCU) activation function and explain why PReLU enables this capability. Our results show that the single-layer PReLU network can achieve 100\% success rate in a wider range of learning rates while using only three learnable parameters.

PReLU: Yet Another Single-Layer Solution to the XOR Problem

Abstract

This paper demonstrates that a single-layer neural network using Parametric Rectified Linear Unit (PReLU) activation can solve the XOR problem, a simple fact that has been overlooked so far. We compare this solution to the multi-layer perceptron (MLP) and the Growing Cosine Unit (GCU) activation function and explain why PReLU enables this capability. Our results show that the single-layer PReLU network can achieve 100\% success rate in a wider range of learning rates while using only three learnable parameters.
Paper Structure (6 sections, 1 equation, 6 figures, 1 table)

This paper contains 6 sections, 1 equation, 6 figures, 1 table.

Figures (6)

  • Figure 1: PReLU solution to the XOR problem with inputs in $\{0,1\}$. Connection weights are interchangeable. Halve connection weights for inputs in $\{-1,1\}$.
  • Figure 2: Success rate vs learning rate (100 trials, 300 epochs). PReLU shows the widest range of acceptable learning rates for the Adam optimizer in the XOR problem. In a separate experiment, PReLU with a bias shows similar behavior to the other models when close to learning rate 1.
  • Figure 3: Runtime distribution for each model for a single trial (300 epochs). PReLU and GCU (3 learnable parameters each) are significantly faster than the MLP solution (8 learnable parameters) and have similar runtime.
  • Figure 4: Comparison of success rate and MSE over 150 epochs and 100 trials.
  • Figure 5: Average decision boundaries over 100 trials. PReLU and GCU show the least variance in learned solutions, as expected due to their lower number of learnable parameters. PReLU obtained the widest margins between classes, indicating more robustness to noise and better generalization capability. Note that $\{0,1\}$ inputs were used for GCU instead of $\{-1,1\}$, as it gave the best results for this model.
  • ...and 1 more figures