Table of Contents
Fetching ...

DL101 Neural Network Outputs and Loss Functions

Fernando Berzal

TL;DR

This work presents a principled view of neural network outputs and loss functions by tying activation functions and losses to the underlying probabilistic distributions via generalized linear models. It shows that common losses such as MSE, MAE, BCE, and CCE arise as negative log-likelihoods for Gaussian, Laplace, Bernoulli, and multinomial data, respectively, and that output activations correspond to the inverse canonical links of these distributions. The framework extends to GLMs and discusses practical variants for positive-valued targets and heavy-tailed residuals, including Gamma, Poisson, Tweedie, and double Pareto scenarios. The key takeaway is that selecting the right loss and activation pair yields Bayes-optimal, properly calibrated predictions under the assumed noise model, with robust alternatives for outliers and fat-tailed data.

Abstract

The loss function used to train a neural network is strongly connected to its output layer from a statistical point of view. This technical report analyzes common activation functions for a neural network output layer, like linear, sigmoid, ReLU, and softmax, detailing their mathematical properties and their appropriate use cases. A strong statistical justification exists for the selection of the suitable loss function for training a deep learning model. This report connects common loss functions such as Mean Squared Error (MSE), Mean Absolute Error (MAE), and various Cross-Entropy losses to the statistical principle of Maximum Likelihood Estimation (MLE). Choosing a specific loss function is equivalent to assuming a specific probability distribution for the model output, highlighting the link between these functions and the Generalized Linear Models (GLMs) that underlie network output layers. Additional scenarios of practical interest are also considered, such as alternative output encodings, constrained outputs, and distributions with heavy tails.

DL101 Neural Network Outputs and Loss Functions

TL;DR

This work presents a principled view of neural network outputs and loss functions by tying activation functions and losses to the underlying probabilistic distributions via generalized linear models. It shows that common losses such as MSE, MAE, BCE, and CCE arise as negative log-likelihoods for Gaussian, Laplace, Bernoulli, and multinomial data, respectively, and that output activations correspond to the inverse canonical links of these distributions. The framework extends to GLMs and discusses practical variants for positive-valued targets and heavy-tailed residuals, including Gamma, Poisson, Tweedie, and double Pareto scenarios. The key takeaway is that selecting the right loss and activation pair yields Bayes-optimal, properly calibrated predictions under the assumed noise model, with robust alternatives for outliers and fat-tailed data.

Abstract

The loss function used to train a neural network is strongly connected to its output layer from a statistical point of view. This technical report analyzes common activation functions for a neural network output layer, like linear, sigmoid, ReLU, and softmax, detailing their mathematical properties and their appropriate use cases. A strong statistical justification exists for the selection of the suitable loss function for training a deep learning model. This report connects common loss functions such as Mean Squared Error (MSE), Mean Absolute Error (MAE), and various Cross-Entropy losses to the statistical principle of Maximum Likelihood Estimation (MLE). Choosing a specific loss function is equivalent to assuming a specific probability distribution for the model output, highlighting the link between these functions and the Generalized Linear Models (GLMs) that underlie network output layers. Additional scenarios of practical interest are also considered, such as alternative output encodings, constrained outputs, and distributions with heavy tails.

Paper Structure

This paper contains 28 sections, 105 equations, 4 tables.