Table of Contents
Fetching ...

MCEL: Margin-Based Cross-Entropy Loss for Error-Tolerant Quantized Neural Networks

Mikail Yayla, Akash Kumar

TL;DR

A novel loss function is proposed, the Margin Cross-Entropy Loss (MCEL), which explicitly promotes logit-level margin separation while preserving the favorable optimization properties of the standard cross-entropy loss, and introduces an interpretable margin parameter that allows robustness to be tuned in a principled manner.

Abstract

Robustness to bit errors is a key requirement for the reliable use of neural networks (NNs) on emerging approximate computing platforms and error-prone memory technologies. A common approach to achieve bit error tolerance in NNs is injecting bit flips during training according to a predefined error model. While effective in certain scenarios, training-time bit flip injection introduces substantial computational overhead, often degrades inference accuracy at high error rates, and scales poorly for larger NN architectures. These limitations make error injection an increasingly impractical solution for ensuring robustness on future approximate computing platforms and error-prone memory technologies. In this work, we investigate the mechanisms that enable NNs to tolerate bit errors without relying on error-aware training. We establish a direct connection between bit error tolerance and classification margins at the output layer. Building on this insight, we propose a novel loss function, the Margin Cross-Entropy Loss (MCEL), which explicitly promotes logit-level margin separation while preserving the favorable optimization properties of the standard cross-entropy loss. Furthermore, MCEL introduces an interpretable margin parameter that allows robustness to be tuned in a principled manner. Extensive experimental evaluations across multiple datasets of varying complexity, diverse NN architectures, and a range of quantization schemes demonstrate that MCEL substantially improves bit error tolerance, up to 15 % in accuracy for an error rate of 1 %. Our proposed MCEL method is simple to implement, efficient, and can be integrated as a drop-in replacement for standard CEL. It provides a scalable and principled alternative to training-time bit flip injection, offering new insights into the origins of NN robustness and enabling more efficient deployment on approximate computing and memory systems.

MCEL: Margin-Based Cross-Entropy Loss for Error-Tolerant Quantized Neural Networks

TL;DR

A novel loss function is proposed, the Margin Cross-Entropy Loss (MCEL), which explicitly promotes logit-level margin separation while preserving the favorable optimization properties of the standard cross-entropy loss, and introduces an interpretable margin parameter that allows robustness to be tuned in a principled manner.

Abstract

Robustness to bit errors is a key requirement for the reliable use of neural networks (NNs) on emerging approximate computing platforms and error-prone memory technologies. A common approach to achieve bit error tolerance in NNs is injecting bit flips during training according to a predefined error model. While effective in certain scenarios, training-time bit flip injection introduces substantial computational overhead, often degrades inference accuracy at high error rates, and scales poorly for larger NN architectures. These limitations make error injection an increasingly impractical solution for ensuring robustness on future approximate computing platforms and error-prone memory technologies. In this work, we investigate the mechanisms that enable NNs to tolerate bit errors without relying on error-aware training. We establish a direct connection between bit error tolerance and classification margins at the output layer. Building on this insight, we propose a novel loss function, the Margin Cross-Entropy Loss (MCEL), which explicitly promotes logit-level margin separation while preserving the favorable optimization properties of the standard cross-entropy loss. Furthermore, MCEL introduces an interpretable margin parameter that allows robustness to be tuned in a principled manner. Extensive experimental evaluations across multiple datasets of varying complexity, diverse NN architectures, and a range of quantization schemes demonstrate that MCEL substantially improves bit error tolerance, up to 15 % in accuracy for an error rate of 1 %. Our proposed MCEL method is simple to implement, efficient, and can be integrated as a drop-in replacement for standard CEL. It provides a scalable and principled alternative to training-time bit flip injection, offering new insights into the origins of NN robustness and enabling more efficient deployment on approximate computing and memory systems.
Paper Structure (22 sections, 26 equations, 4 figures, 2 tables)

This paper contains 22 sections, 26 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Illustration of output-layer logits with a margin $m$. Five classes are shown as an example. Class 0 is the prediction and the extent of the margin to the second highest logit determines the error tolerance of the NN.
  • Figure 2: Intuition behind the tanh-clipped Margin Cross-Entropy (MCEL) Loss. Raw logits are first passed through a scaled tanh function, bounding class scores to a finite interval $[-L, L]$ and preventing unbounded growth of logit magnitudes. Dotted lines: Saturation limits imposed by tanh. Solid curve: tanh function. Horizontal violet line: Competing (non-target) class score. The enforced margin $m$ requires the target class score to exceed competing class scores by a fixed fraction of the available dynamic range, i.e. $\frac{m}{2L}$.
  • Figure 3: Accuracy over bit error rate for 2-, 4-, and 8-bit QNNs, as well as BNNs. The error rate is specified under a given bit flip injection rate on the x-axes. CEL is the state of the art (SOTA), MCEL is our proposed method. We do not perform any bit flip injection during training, we only inject errors during inference to assess error tolerance.
  • Figure 4: Evolution of the margins between the highest and the second highest logit the NN returns (i.e. MLM for Mean Logit Margin) during training. Left column: CEL. Right column: MCEL. QNNs with 4-bit quantization and BNNs shown as examples. Y-axis: Average of top2 margins (see Equation \ref{['eq:top2']} for one margin and Figure \ref{['fig:logit-margin-illustration']} for an illustration) over all per-training-sample top2 margins of one epoch.