Table of Contents
Fetching ...

DiffInf: Influence-Guided Diffusion for Supervision Alignment in Facial Attribute Learning

Basudha Pal, Rama Chellappa

TL;DR

DiffInf, a self-influence--guided diffusion framework for mitigating annotation inconsistencies in facial attribute learning, is introduced and it is demonstrated that repairing influential annotation inconsistencies at the image level enhances downstream facial attribute classification without sacrificing distributional coverage.

Abstract

Facial attribute classification relies on large-scale annotated datasets in which many traits, such as age and expression, are inherently ambiguous and continuous but are discretized into categorical labels. Annotation inconsistencies arise from subjectivity and visual confounders such as pose, illumination, expression, and demographic variation, creating mismatch between images and assigned labels. These inconsistencies introduce supervision errors that impair representation learning and degrade downstream prediction. We introduce DiffInf, a self-influence--guided diffusion framework for mitigating annotation inconsistencies in facial attribute learning. We first train a baseline classifier and compute sample-wise self-influence scores using a practical first-order approximation to identify training instances that disproportionately destabilize optimization. Instead of discarding these influential samples, we apply targeted generative correction via a latent diffusion autoencoder to better align visual content with assigned labels while preserving identity and realism. To enable differentiable guidance during correction, we train a lightweight predictor of high-influence membership and use it as a surrogate influence regularizer. The edited samples replace the originals, yielding an influence-refined dataset of unchanged size. Across multi-class facial attribute classification, DiffInf consistently improves generalization compared with standard noisy-label training, robust optimization baselines, and influence-based filtering. Our results demonstrate that repairing influential annotation inconsistencies at the image level enhances downstream facial attribute classification without sacrificing distributional coverage.

DiffInf: Influence-Guided Diffusion for Supervision Alignment in Facial Attribute Learning

TL;DR

DiffInf, a self-influence--guided diffusion framework for mitigating annotation inconsistencies in facial attribute learning, is introduced and it is demonstrated that repairing influential annotation inconsistencies at the image level enhances downstream facial attribute classification without sacrificing distributional coverage.

Abstract

Facial attribute classification relies on large-scale annotated datasets in which many traits, such as age and expression, are inherently ambiguous and continuous but are discretized into categorical labels. Annotation inconsistencies arise from subjectivity and visual confounders such as pose, illumination, expression, and demographic variation, creating mismatch between images and assigned labels. These inconsistencies introduce supervision errors that impair representation learning and degrade downstream prediction. We introduce DiffInf, a self-influence--guided diffusion framework for mitigating annotation inconsistencies in facial attribute learning. We first train a baseline classifier and compute sample-wise self-influence scores using a practical first-order approximation to identify training instances that disproportionately destabilize optimization. Instead of discarding these influential samples, we apply targeted generative correction via a latent diffusion autoencoder to better align visual content with assigned labels while preserving identity and realism. To enable differentiable guidance during correction, we train a lightweight predictor of high-influence membership and use it as a surrogate influence regularizer. The edited samples replace the originals, yielding an influence-refined dataset of unchanged size. Across multi-class facial attribute classification, DiffInf consistently improves generalization compared with standard noisy-label training, robust optimization baselines, and influence-based filtering. Our results demonstrate that repairing influential annotation inconsistencies at the image level enhances downstream facial attribute classification without sacrificing distributional coverage.
Paper Structure (16 sections, 18 equations, 3 figures, 1 table)

This paper contains 16 sections, 18 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: The DiffInf framework for influence-guided generative supervision alignment. The process begins with a semantic encoder and conditional DDIM encoding to project the input image into a latent representation. During decoding, we perform iterative latent optimization guided by a composite loss function. This objective balances identity preservation (via a face recognition/verification network), influence suppression (via a dedicated predictor of high-influence membership), and semantic alignment while maintaining realism. By targeting high-influence samples for generative correction, DiffInf realigns training images with their assigned attribute labels while maintaining identity fidelity and structural realism.
  • Figure 2: Qualitative examples of influence-guided generative correction (DiffInf) under noisy supervision. Each pair shows a high self-influence training sample (left) and its generated correction (right). The label displayed above each pair corresponds to the dataset-provided class index, which may be noisy. In our age-group setting, class 0 denotes Young (0--18), class 1 denotes Middle (25--40), and class 2 denotes Old (50+). Several original samples exhibit visible semantic mismatch with their assigned class (e.g., an older-looking face labeled as Young), which leads to disproportionately large influence during training. DiffInf preserves identity-specific characteristics such as facial structure, pose, and illumination while gradually adjusting age-related features to align with the assigned class label. The transition from original to generated reflects controlled semantic refinement rather than drastic alteration, transforming disruptive samples into label-consistent instances while maintaining realism and identity fidelity.
  • Figure 3: Qualitative examples of influence-guided generative correction (DiffInf) for facial expression classification under noisy supervision. Each pair shows an original high self-influence training sample (left) and its generated correction (right). The label displayed above each pair corresponds to the dataset-provided expression class index, which may be corrupted. In our setup, class 0 denotes Happy, class 1 denotes Neutral, class 2 denotes Surprised, and class 3 denotes Sad. Several original samples exhibit semantic inconsistency with their assigned expression label (e.g., neutral or ambiguous faces labeled as expressive categories), leading to disproportionately large influence during training. DiffInf preserves identity-specific attributes such as facial structure, pose, and illumination while subtly adjusting expression-related features such as mouth curvature, eyebrow configuration, and eye openness to align with the assigned class. The transition from original to generated reflects controlled semantic refinement rather than identity alteration, converting disruptive samples into label-consistent training instances while maintaining realism and facial identity fidelity.