Table of Contents
Fetching ...

Learning from Concealed Labels

Zhongnian Li, Meng Wei, Peng Ying, Tongfeng Sun, Xinzheng Xu

TL;DR

This paper introduces a novel privacy-preserving setting called Concealed Labels for multi-class classification, where sensitive labels are not disclosed during annotation. It derives an unbiased risk estimator that expresses the ordinary risk in terms of concealed-label data, proves consistency and a parametric-rate estimation bound, and strengthens training with a corrected risk variant to mitigate negative risk. The approach is validated on standard benchmarks and real-world concealed-label datasets, showing consistent improvements over existing privacy-label learning and PU/UL baselines, with practical guidance on loss functions and the impact of the labeled-set size. The work lays a foundation for privacy-safe labeling in sensitive domains and suggests directions for extending to multi-label tasks and deeper performance enhancements.

Abstract

Annotating data for sensitive labels (e.g., disease, smoking) poses a potential threats to individual privacy in many real-world scenarios. To cope with this problem, we propose a novel setting to protect privacy of each instance, namely learning from concealed labels for multi-class classification. Concealed labels prevent sensitive labels from appearing in the label set during the label collection stage, which specifies none and some random sampled insensitive labels as concealed labels set to annotate sensitive data. In this paper, an unbiased estimator can be established from concealed data under mild assumptions, and the learned multi-class classifier can not only classify the instance from insensitive labels accurately but also recognize the instance from the sensitive labels. Moreover, we bound the estimation error and show that the multi-class classifier achieves the optimal parametric convergence rate. Experiments demonstrate the significance and effectiveness of the proposed method for concealed labels in synthetic and real-world datasets.

Learning from Concealed Labels

TL;DR

This paper introduces a novel privacy-preserving setting called Concealed Labels for multi-class classification, where sensitive labels are not disclosed during annotation. It derives an unbiased risk estimator that expresses the ordinary risk in terms of concealed-label data, proves consistency and a parametric-rate estimation bound, and strengthens training with a corrected risk variant to mitigate negative risk. The approach is validated on standard benchmarks and real-world concealed-label datasets, showing consistent improvements over existing privacy-label learning and PU/UL baselines, with practical guidance on loss functions and the impact of the labeled-set size. The work lays a foundation for privacy-safe labeling in sensitive domains and suggests directions for extending to multi-label tasks and deeper performance enhancements.

Abstract

Annotating data for sensitive labels (e.g., disease, smoking) poses a potential threats to individual privacy in many real-world scenarios. To cope with this problem, we propose a novel setting to protect privacy of each instance, namely learning from concealed labels for multi-class classification. Concealed labels prevent sensitive labels from appearing in the label set during the label collection stage, which specifies none and some random sampled insensitive labels as concealed labels set to annotate sensitive data. In this paper, an unbiased estimator can be established from concealed data under mild assumptions, and the learned multi-class classifier can not only classify the instance from insensitive labels accurately but also recognize the instance from the sensitive labels. Moreover, we bound the estimation error and show that the multi-class classifier achieves the optimal parametric convergence rate. Experiments demonstrate the significance and effectiveness of the proposed method for concealed labels in synthetic and real-world datasets.

Paper Structure

This paper contains 21 sections, 44 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: Illustrations a example of concealed labels during the annotation procedure in real-world scenario. Smoking, being a sensitive label, is often a challenging attribute to collect data, due to people's hesitancy in admitting their smoking habits in daily life. To ensure privacy protection, it is crucial not to include the sensitive label. Concealed labels are employed to prevent the inclusion of the sensitive label that needs to be concealed. By utilizing the none label, data privacy can be safeguarded, ensuring that the sensitive label remains undisclosed for adversary.
  • Figure 2: Illustrations the negative risk of base models in experiments with various two datasets.