Table of Contents
Fetching ...

Hard Labels In! Rethinking the Role of Hard Labels in Mitigating Local Semantic Drift

Jiacheng Cui, Bingkui Tong, Xinyue Bi, Xiaohan Zhao, Jiacheng Liu, Zhiqiang Shen

TL;DR

This work investigates local-view semantic drift arising when only a small number of image crops are labeled with soft targets in dataset distillation. It introduces HALD, a Soft–Hard–Soft calibration framework that uses hard labels as content-agnostic anchors to recalibrate the semantic space, mitigating drift while preserving soft-label advantages. The authors provide theoretical guarantees on drift mitigation and gradient alignment, and validate HALD through extensive experiments on Tiny-ImageNet and ImageNet-1K, achieving state-of-the-art results under constrained soft-label storage. The findings demonstrate that hard labels can complement soft supervision to improve generalization and reduce storage costs in large-scale distillation pipelines. Overall, HALD rethinks the role of hard labels, offering a practical mechanism to enhance performance with limited soft-label coverage.

Abstract

Soft labels generated by teacher models have become a dominant paradigm for knowledge transfer and recent large-scale dataset distillation such as SRe2L, RDED, LPLD, offering richer supervision than conventional hard labels. However, we observe that when only a limited number of crops per image are used, soft labels are prone to local semantic drift: a crop may visually resemble another class, causing its soft embedding to deviate from the ground-truth semantics of the original image. This mismatch between local visual content and global semantic meaning introduces systematic errors and distribution misalignment between training and testing. In this work, we revisit the overlooked role of hard labels and show that, when appropriately integrated, they provide a powerful content-agnostic anchor to calibrate semantic drift. We theoretically characterize the emergence of drift under few soft-label supervision and demonstrate that hybridizing soft and hard labels restores alignment between visual content and semantic supervision. Building on this insight, we propose a new training paradigm, Hard Label for Alleviating Local Semantic Drift (HALD), which leverages hard labels as intermediate corrective signals while retaining the fine-grained advantages of soft labels. Extensive experiments on dataset distillation and large-scale conventional classification benchmarks validate our approach, showing consistent improvements in generalization. On ImageNet-1K, we achieve 42.7% with only 285M storage for soft labels, outperforming prior state-of-the-art LPLD by 9.0%. Our findings re-establish the importance of hard labels as a complementary tool, and call for a rethinking of their role in soft-label-dominated training.

Hard Labels In! Rethinking the Role of Hard Labels in Mitigating Local Semantic Drift

TL;DR

This work investigates local-view semantic drift arising when only a small number of image crops are labeled with soft targets in dataset distillation. It introduces HALD, a Soft–Hard–Soft calibration framework that uses hard labels as content-agnostic anchors to recalibrate the semantic space, mitigating drift while preserving soft-label advantages. The authors provide theoretical guarantees on drift mitigation and gradient alignment, and validate HALD through extensive experiments on Tiny-ImageNet and ImageNet-1K, achieving state-of-the-art results under constrained soft-label storage. The findings demonstrate that hard labels can complement soft supervision to improve generalization and reduce storage costs in large-scale distillation pipelines. Overall, HALD rethinks the role of hard labels, offering a practical mechanism to enhance performance with limited soft-label coverage.

Abstract

Soft labels generated by teacher models have become a dominant paradigm for knowledge transfer and recent large-scale dataset distillation such as SRe2L, RDED, LPLD, offering richer supervision than conventional hard labels. However, we observe that when only a limited number of crops per image are used, soft labels are prone to local semantic drift: a crop may visually resemble another class, causing its soft embedding to deviate from the ground-truth semantics of the original image. This mismatch between local visual content and global semantic meaning introduces systematic errors and distribution misalignment between training and testing. In this work, we revisit the overlooked role of hard labels and show that, when appropriately integrated, they provide a powerful content-agnostic anchor to calibrate semantic drift. We theoretically characterize the emergence of drift under few soft-label supervision and demonstrate that hybridizing soft and hard labels restores alignment between visual content and semantic supervision. Building on this insight, we propose a new training paradigm, Hard Label for Alleviating Local Semantic Drift (HALD), which leverages hard labels as intermediate corrective signals while retaining the fine-grained advantages of soft labels. Extensive experiments on dataset distillation and large-scale conventional classification benchmarks validate our approach, showing consistent improvements in generalization. On ImageNet-1K, we achieve 42.7% with only 285M storage for soft labels, outperforming prior state-of-the-art LPLD by 9.0%. Our findings re-establish the importance of hard labels as a complementary tool, and call for a rethinking of their role in soft-label-dominated training.

Paper Structure

This paper contains 45 sections, 6 theorems, 76 equations, 3 figures, 23 tables.

Key Result

Lemma 1

For $s$ i.i.d. crops define $\hat{p}_s:=\frac{1}{s}\sum_{i=1}^s \tilde{p}(x^{(\mathrm{crop})}_i)$. Then, In particular, under LVSD, the deviation is strictly positive for any finite $s$ and decays as $\mathcal{O}(1/s)$.

Figures (3)

  • Figure 1: Illustration of local-view semantic drift: partial crops may change object–label relations, yielding semantics that deviate from the full image.
  • Figure 2: Train and test loss landscapes on an IPC=10 distilled dataset with SLC=50, comparing (i) finite soft-label coverage and (ii) our method.
  • Figure 3: Gradient similarity between hard- and soft-label losses over training, evaluated on real-image crops and optimization-based distilled data, showing a clear upward trend indicative of strengthened alignment.

Theorems & Definitions (14)

  • Definition 1: Local-View Semantic Drift (LVSD)
  • Lemma 1
  • Definition 2: Soft Label per Image (SLI)
  • Definition 3: Soft Label per Class (SLC)
  • Theorem 1: Proof in Appendix \ref{['proof_lower_bound_limited_crop']}
  • Theorem 2: Proof in Appendix \ref{['proof_diminished_gen']}
  • Theorem 3: Soft--Hard Gradient Consistency; proof in Appendix \ref{['proof:mixing_dual']}
  • Corollary 1: Proof in Appendix \ref{['proof:shs_improve_gen']}
  • proof : Proof of Theorem \ref{['thrm_lower_bound_limited_crop']}
  • proof : Proof of Theorem \ref{['thm:finite-s-lb-fixed']}
  • ...and 4 more