Evolving Loss Functions for Specific Image Augmentation Techniques
Brandon Morgan, Dean Hougen
TL;DR
The paper investigates how loss-function performance varies with different image augmentations and demonstrates that augmentation-aware NLFS can yield improvements over cross-entropy. By leveraging five augmentation techniques and regularized evolution, the authors discover augmentation-specific losses, transfer them to larger models, and validate them on CIFAR and fine-tuning datasets. The standout result is the inverse bessel logarithm loss (A2), which consistently outperforms CE across many experiments and datasets, highlighting the importance of loss design aligned with data augmentation. Overall, the work emphasizes augmentation-aware loss discovery as a viable path to improved generalization and provides a public codebase for reproducible NLFS under diverse augmentation regimes.
Abstract
Previous work in Neural Loss Function Search (NLFS) has shown a lack of correlation between smaller surrogate functions and large convolutional neural networks with massive regularization. We expand upon this research by revealing another disparity that exists, correlation between different types of image augmentation techniques. We show that different loss functions can perform well on certain image augmentation techniques, while performing poorly on others. We exploit this disparity by performing an evolutionary search on five types of image augmentation techniques in the hopes of finding image augmentation specific loss functions. The best loss functions from each evolution were then taken and transferred to WideResNet-28-10 on CIFAR-10 and CIFAR-100 across each of the five image augmentation techniques. The best from that were then taken and evaluated by fine-tuning EfficientNetV2Small on the CARS, Oxford-Flowers, and Caltech datasets across each of the five image augmentation techniques. Multiple loss functions were found that outperformed cross-entropy across multiple experiments. In the end, we found a single loss function, which we called the inverse bessel logarithm loss, that was able to outperform cross-entropy across the majority of experiments.
