Rethinking the Harmonic Loss via Non-Euclidean Distance Layers

Maxwell Miller-Golub; Collin Coil; Kamil Faber; Marcin Pietron; Panpan Zheng; Pasquale Minervini; Roberto Corizzo

Rethinking the Harmonic Loss via Non-Euclidean Distance Layers

Maxwell Miller-Golub, Collin Coil, Kamil Faber, Marcin Pietron, Panpan Zheng, Pasquale Minervini, Roberto Corizzo

TL;DR

This work comprehensively evaluates distance-tailored harmonic losses on both vision backbones and large language models, and concludes that cosine distances provide the most favorable trade-off, consistently improving accuracy while lowering carbon emissions.

Abstract

Cross-entropy loss has long been the standard choice for training deep neural networks, yet it suffers from interpretability limitations, unbounded weight growth, and inefficiencies that can contribute to costly training dynamics. The harmonic loss is a distance-based alternative grounded in Euclidean geometry that improves interpretability and mitigates phenomena such as grokking, or delayed generalization on the test set. However, the study of harmonic loss remains narrow: only Euclidean distance is explored, and no systematic evaluation of computational efficiency or sustainability was conducted. We extend harmonic loss by systematically investigating a broad spectrum of distance metrics as replacements for the Euclidean distance. We comprehensively evaluate distance-tailored harmonic losses on both vision backbones and large language models. Our analysis is framed around a three-way evaluation of model performance, interpretability, and sustainability. On vision tasks, cosine distances provide the most favorable trade-off, consistently improving accuracy while lowering carbon emissions, whereas Bray-Curtis and Mahalanobis further enhance interpretability at varying efficiency costs. On language models, cosine-based harmonic losses improve gradient and learning stability, strengthen representation structure, and reduce emissions relative to cross-entropy and Euclidean heads. Our code is available at: https://anonymous.4open.science/r/rethinking-harmonic-loss-5BAB/.

Rethinking the Harmonic Loss via Non-Euclidean Distance Layers

TL;DR

Abstract

Paper Structure (56 sections, 2 theorems, 26 equations, 16 figures, 25 tables)

This paper contains 56 sections, 2 theorems, 26 equations, 16 figures, 25 tables.

Introduction
Harmonic loss
Non-Euclidean Harmonic Losses
Class Prototypes, Distances, and non-Euclidean Harmonic Loss functions
Experiments
Training and Evaluation
Vision: Radar Plots
Language: Radar Plots
Related Work
Conclusion
Theoretical Properties of Distance-based Probabilistic Layers
Scale invariance and finite minimizers
Margin-style generalization (PAC-Bayes view)
Integration into Deep Learning Pipelines
Model Architectures
...and 41 more sections

Key Result

theorem 1

Assume $d$ is $1$-homogeneous and the training set is metric-separable. For $\kappa(r)=r^{-\omega}$, the empirical loss $\mathcal{L}$ is invariant to the joint rescaling $(x,w)\mapsto(c x, c w)$ and attains a global minimum at finite$\{w_k\}$. In particular, increasing $\|w_k\|$ further does not red

Figures (16)

Figure 1: Vision: Radar plots: 1) Model Performance (F1, Accuracy); 2) Interpretability (PC2 EV, PCA 90%), and 3) Sustainability (Duration/Epoch/GFLOPs, Emissions). Plots feature Baseline (Cross-Entropy), Euclidean harmonic, and the four top-performing non-Euclidean harmonic losses.
Figure 2: Language: Radar plots: 1) Model Performance (Perplexity, Effective Rank, Gradient Stability); 2) Interpretability (PCA5 EV), and 3) Sustainability (Emissions). Plots feature Baseline (CE), Euclidean harmonic, and the top-performing non-Euclidean harmonic losses.
Figure 3: Vision: Accuracy curves with Confidence Intervals. Shaded regions show 95% confidence intervals (n = 3 seeds)
Figure 4: Vision: Radar plots: 1) Model Performance (F1, Accuracy); 2) Interpretability (PC2 EV, PCA 90%), and 3) Sustainability (Duration/Epoch/GFLOPs, Emissions). Plots feature Baseline (Cross-Entropy), Euclidean harmonic, and the four top-performing non-Euclidean harmonic losses.
Figure 5: Vision: Radar plots -- MNIST, CIFAR10, CIFAR100: 1) Model Performance (F1, Accuracy); 2) Interpretability (PC2 EV, PCA 90%), and 3) Sustainability (Duration/Epoch, Emissions). Plots feature Baseline (Cross-Entropy), Euclidean harmonic, and the four top-performing losses.
...and 11 more figures

Theorems & Definitions (4)

definition 1: Metric separability and homogeneity
theorem 1: Finite minimizer and scale invariance for harmonic link
definition 2: Distance margin
theorem 2: Generalization with metric margin

Rethinking the Harmonic Loss via Non-Euclidean Distance Layers

TL;DR

Abstract

Rethinking the Harmonic Loss via Non-Euclidean Distance Layers

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (16)

Theorems & Definitions (4)