Table of Contents
Fetching ...

Statistical Properties of the log-cosh Loss Function Used in Machine Learning

Resve A. Saleh, A. K. Md. Ehsanes Saleh

TL;DR

The paper identifies the Cosh distribution as the probabilistic basis for the log-cosh loss, deriving the MLE for the location parameter with asymptotic variance $\mathrm{Var}(\hat{\theta})=\frac{2\sigma^2}{n}$ and Fisher information $\mathcal{I}(\theta)=\frac{1}{2\sigma^2}$, and proves asymptotic unbiasedness. It benchmarks log-cosh against robust estimators in both simple and multiple regression, finding comparable standard errors and robustness to outliers. The authors extend the framework to quantile regression via a smooth check function, yielding a quantile regression method with improved monotonicity and reduced crossing relative to convolution-based smoothing (SMRQ vs. Conquer). Bootstrapping validations support the asymptotic theory, and SMRQ demonstrates favorable finite-sample efficiency in several datasets. Overall, the work provides a principled statistical treatment of log-cosh, clarifying its properties and practical value in robust estimation and quantile analysis.

Abstract

This paper analyzes a popular loss function used in machine learning called the log-cosh loss function. A number of papers have been published using this loss function but, to date, no statistical analysis has been presented in the literature. In this paper, we present the distribution function from which the log-cosh loss arises. We compare it to a similar distribution, called the Cauchy distribution, and carry out various statistical procedures that characterize its properties. In particular, we examine its associated pdf, cdf, likelihood function and Fisher information. Side-by-side we consider the Cauchy and Cosh distributions as well as the MLE of the location parameter with asymptotic bias, asymptotic variance, and confidence intervals. We also provide a comparison of robust estimators from several other loss functions, including the Huber loss function and the rank dispersion function. Further, we examine the use of the log-cosh function for quantile regression. In particular, we identify a quantile distribution function from which a maximum likelihood estimator for quantile regression can be derived. Finally, we compare a quantile M-estimator based on log-cosh with robust monotonicity against another approach to quantile regression based on convolutional smoothing.

Statistical Properties of the log-cosh Loss Function Used in Machine Learning

TL;DR

The paper identifies the Cosh distribution as the probabilistic basis for the log-cosh loss, deriving the MLE for the location parameter with asymptotic variance and Fisher information , and proves asymptotic unbiasedness. It benchmarks log-cosh against robust estimators in both simple and multiple regression, finding comparable standard errors and robustness to outliers. The authors extend the framework to quantile regression via a smooth check function, yielding a quantile regression method with improved monotonicity and reduced crossing relative to convolution-based smoothing (SMRQ vs. Conquer). Bootstrapping validations support the asymptotic theory, and SMRQ demonstrates favorable finite-sample efficiency in several datasets. Overall, the work provides a principled statistical treatment of log-cosh, clarifying its properties and practical value in robust estimation and quantile analysis.

Abstract

This paper analyzes a popular loss function used in machine learning called the log-cosh loss function. A number of papers have been published using this loss function but, to date, no statistical analysis has been presented in the literature. In this paper, we present the distribution function from which the log-cosh loss arises. We compare it to a similar distribution, called the Cauchy distribution, and carry out various statistical procedures that characterize its properties. In particular, we examine its associated pdf, cdf, likelihood function and Fisher information. Side-by-side we consider the Cauchy and Cosh distributions as well as the MLE of the location parameter with asymptotic bias, asymptotic variance, and confidence intervals. We also provide a comparison of robust estimators from several other loss functions, including the Huber loss function and the rank dispersion function. Further, we examine the use of the log-cosh function for quantile regression. In particular, we identify a quantile distribution function from which a maximum likelihood estimator for quantile regression can be derived. Finally, we compare a quantile M-estimator based on log-cosh with robust monotonicity against another approach to quantile regression based on convolutional smoothing.
Paper Structure (17 sections, 61 equations, 17 figures, 8 tables)

This paper contains 17 sections, 61 equations, 17 figures, 8 tables.

Figures (17)

  • Figure 1: (a) $\text{cosh}(x)$ (b) $1/\text{cosh}(x)$
  • Figure 2: Cosh distribution (a) cdf (b) pdf
  • Figure 3: Huber loss and derivative as a function of $x$ for $\delta=1$.
  • Figure 4: Developing a continuous L1 function.
  • Figure 5: Histogram of estimates of $\hat{\theta}$ and $n\widehat{\text{Var}}(\hat{\theta})$ from 10000 samples with $n=100$.
  • ...and 12 more figures