Next Generation Loss Function for Image Classification

Shakhnaz Akhmedova; Nils Körber

Next Generation Loss Function for Image Classification

Shakhnaz Akhmedova, Nils Körber

TL;DR

The paper addresses automatic loss-function design for image classification and segmentation by using Genetic Programming (GP) to evolve loss functions from a broad operator set. The standout function, Next Generation Loss ($NGL$), outperforms Cross Entropy and several alternatives on diverse small datasets and scales to ImageNet-1k and segmentation benchmarks, indicating strong generalizability. $NGL$ combines an exponential term with a cosine-sine component, formalized as $f_{NGL} = \frac{1}{N}\sum_{i=1}^{N}{\left[e^{(\alpha-y^{(i)}_{pred}\cdot(1+y^{(i)}_{real}))}-\cos(\cos(\sin(y^{(i)}_{pred})))\right]},$ with $\alpha=2.4092$, providing implicit regularization that reduces overfitting. Overall, the results demonstrate GP’s potential to discover robust, task-agnostic loss functions that can be applied across architectures and scales, offering a promising direction for automated loss design.

Abstract

Neural networks are trained by minimizing a loss function that defines the discrepancy between the predicted model output and the target value. The selection of the loss function is crucial to achieve task-specific behaviour and highly influences the capability of the model. A variety of loss functions have been proposed for a wide range of tasks affecting training and model performance. For classification tasks, the cross entropy is the de-facto standard and usually the first choice. Here, we try to experimentally challenge the well-known loss functions, including cross entropy (CE) loss, by utilizing the genetic programming (GP) approach, a population-based evolutionary algorithm. GP constructs loss functions from a set of operators and leaf nodes and these functions are repeatedly recombined and mutated to find an optimal structure. Experiments were carried out on different small-sized datasets CIFAR-10, CIFAR-100 and Fashion-MNIST using an Inception model. The 5 best functions found were evaluated for different model architectures on a set of standard datasets ranging from 2 to 102 classes and very different sizes. One function, denoted as Next Generation Loss (NGL), clearly stood out showing same or better performance for all tested datasets compared to CE. To evaluate the NGL function on a large-scale dataset, we tested its performance on the Imagenet-1k dataset where it showed improved top-1 accuracy compared to models trained with identical settings and other losses. Finally, the NGL was trained on a segmentation downstream task for Pascal VOC 2012 and COCO-Stuff164k datasets improving the underlying model performance.

Next Generation Loss Function for Image Classification

TL;DR

), outperforms Cross Entropy and several alternatives on diverse small datasets and scales to ImageNet-1k and segmentation benchmarks, indicating strong generalizability.

combines an exponential term with a cosine-sine component, formalized as

with

, providing implicit regularization that reduces overfitting. Overall, the results demonstrate GP’s potential to discover robust, task-agnostic loss functions that can be applied across architectures and scales, offering a promising direction for automated loss design.

Abstract

Paper Structure (17 sections, 9 equations, 4 figures, 7 tables, 2 algorithms)

This paper contains 17 sections, 9 equations, 4 figures, 7 tables, 2 algorithms.

Introduction
Related Work
Method
Experiments
Loss function search
Evaluation
Small datasets for classification
ImageNet-1k
Segmentation
Discussion
Code
Conclusions
Method
Experiments
Loss function search
...and 2 more sections

Figures (4)

Figure 1: NGL and CE functions (left) and their gradients (right) for $y_{real}=1$.
Figure 2: Mean validation accuracy of ResNet101 model on each epoch during ImageNet-1k training.
Figure A: Left: an example of the solution representation for tree-based GP: $\frac{0.5 \times x}{y} -1.05\times\cos(x)$. Right: an example of the subtrees exchange during crossover.
Figure B.2: Functions found by GP and listed in Section 2.1

Next Generation Loss Function for Image Classification

TL;DR

Abstract

Next Generation Loss Function for Image Classification

Authors

TL;DR

Abstract

Table of Contents

Figures (4)