Margin and Consistency Supervision for Calibrated and Robust Vision Models

Salim Khazem

Margin and Consistency Supervision for Calibrated and Robust Vision Models

Salim Khazem

TL;DR

MaCS is presented, a simple, architecture-agnostic regularization framework that jointly enforces logit-space separation and local prediction stability that consistently improves calibration and robustness to common corruptions while preserving or improving top-1 accuracy.

Abstract

Deep vision classifiers often achieve high accuracy while remaining poorly calibrated and fragile under small distribution shifts. We present Margin and Consistency Supervision (MaCS), a simple, architecture-agnostic regularization framework that jointly enforces logit-space separation and local prediction stability. MaCS augments cross-entropy with (i) a hinge-squared margin penalty that enforces a target logit gap between the correct class and the strongest competitor, and (ii) a consistency regularizer that minimizes the KL divergence between predictions on clean inputs and mildly perturbed views. We provide a unifying theoretical analysis showing that increasing classification margin while reducing local sensitivity formalized via a Lipschitz-type stability proxy yields improved generalization guarantees and a provable robustness radius bound scaling with the margin-to-sensitivity ratio. Across several image classification benchmarks and several backbones spanning CNNs and Vision Transformers, MaCS consistently improves calibration (lower ECE and NLL) and robustness to common corruptions while preserving or improving top-1 accuracy. Our approach requires no additional data, no architectural changes, and negligible inference overhead, making it an effective drop-in replacement for standard training objectives.

Margin and Consistency Supervision for Calibrated and Robust Vision Models

TL;DR

Abstract

Paper Structure (22 sections, 3 theorems, 16 equations, 6 figures, 11 tables, 1 algorithm)

This paper contains 22 sections, 3 theorems, 16 equations, 6 figures, 11 tables, 1 algorithm.

Introduction
Related Work
Method
Problem Setup
Margin Loss
Consistency Loss
Combined MaCS Objective
Theoretical Motivation
Margin and Generalization
Consistency as a Sensitivity Proxy
Robustness via Margin-to-Sensitivity Ratio
Connecting MaCS to the Robustness Radius
Experiments
Experimental Setup
Main Results
...and 7 more sections

Key Result

Theorem 4.2

Let $f: \mathbb{R}^d \to \mathbb{R}^K$ be a neural network with spectral complexity $R_f$, and let $\mathcal{D}$ be a distribution over $\mathbb{R}^d \times [K]$. For any margin $\gamma > 0$, with probability at least $1 - \delta$ over an i.i.d. training set $S$ of size $n$ drawn from $\mathcal{D}$: where $\hat{L}_\gamma(S)$ is the fraction of training samples with margin less than $\gamma$, $B$ i

Figures (6)

Figure 1: Per-dataset model curves across methods. Each line corresponds to a model and traces accuracy across training objectives, highlighting method-specific gains.
Figure 2: Overview of MaCS training. The model processes both clean input $x$ and perturbed input $\tilde{x} = T(x)$. The total loss combines cross-entropy, a margin penalty encouraging $\gamma(x) \geq \Delta$, and a KL-based consistency term enforcing prediction stability.
Figure 3: Accuracy improvement of MaCS over baseline (cross-entropy) across all dataset--model configurations. Each line connects the baseline (left) to MaCS (right) accuracy. MaCS improves over baseline in the large majority of settings, with the largest gains on CIFAR and Food-101.
Figure 4: Negative log-likelihood comparison for ResNet-50 on CIFAR-10/100. MaCS improves NLL relative to baseline but does not always outperform the strongest calibration baselines.
Figure 5: Corruption robustness on CIFAR-10-C and CIFAR-100-C (ResNet-50). MaCS consistently outperforms all baselines including Mixup.
...and 1 more figures

Theorems & Definitions (9)

Definition 4.1: Spectral Complexity
Theorem 4.2: Margin-Based Generalization bartlett2017spectrally
Definition 4.3: Local Sensitivity
Remark 4.4: Consistency Controls Sensitivity
Theorem 4.5: Margin-Stability Robustness Radius
proof
Corollary 4.6: Radius Under Lipschitz Logits
proof
Remark 4.7: On the Theory-Practice Gap

Margin and Consistency Supervision for Calibrated and Robust Vision Models

TL;DR

Abstract

Margin and Consistency Supervision for Calibrated and Robust Vision Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (9)