Table of Contents
Fetching ...

Class-attribute Priors: Adapting Optimization to Heterogeneity and Fairness Objective

Xuechen Zhang, Mingchen Li, Jiasi Chen, Christos Thrampoulidis, Samet Oymak

TL;DR

The paper tackles heterogeneity across classes in multi-class classification and fairness objectives by introducing Class-attribute Priors (CAP), a meta-learning framework that maps class attributes to class-specific optimization strategies (A2H). CAP reduces the hyperparameter search to a compact representation and applies to loss-function design and post-hoc logit adjustment, yielding improvements in balanced accuracy and tail fairness on long-tailed and noisy datasets. The authors provide theoretical intuition and empirical evidence—via Gaussian-mixture analysis and extensive long-tailed dataset experiments—showing how multiple attributes jointly improve per-class optimization and robustness. CAP's flexible, attribute-driven approach offers practical gains for fairness objectives beyond standard metrics and can extend to other personalization tasks like data augmentation and regularization.

Abstract

Modern classification problems exhibit heterogeneities across individual classes: Each class may have unique attributes, such as sample size, label quality, or predictability (easy vs difficult), and variable importance at test-time. Without care, these heterogeneities impede the learning process, most notably, when optimizing fairness objectives. Confirming this, under a gaussian mixture setting, we show that the optimal SVM classifier for balanced accuracy needs to be adaptive to the class attributes. This motivates us to propose CAP: An effective and general method that generates a class-specific learning strategy (e.g. hyperparameter) based on the attributes of that class. This way, optimization process better adapts to heterogeneities. CAP leads to substantial improvements over the naive approach of assigning separate hyperparameters to each class. We instantiate CAP for loss function design and post-hoc logit adjustment, with emphasis on label-imbalanced problems. We show that CAP is competitive with prior art and its flexibility unlocks clear benefits for fairness objectives beyond balanced accuracy. Finally, we evaluate CAP on problems with label noise as well as weighted test objectives to showcase how CAP can jointly adapt to different heterogeneities.

Class-attribute Priors: Adapting Optimization to Heterogeneity and Fairness Objective

TL;DR

The paper tackles heterogeneity across classes in multi-class classification and fairness objectives by introducing Class-attribute Priors (CAP), a meta-learning framework that maps class attributes to class-specific optimization strategies (A2H). CAP reduces the hyperparameter search to a compact representation and applies to loss-function design and post-hoc logit adjustment, yielding improvements in balanced accuracy and tail fairness on long-tailed and noisy datasets. The authors provide theoretical intuition and empirical evidence—via Gaussian-mixture analysis and extensive long-tailed dataset experiments—showing how multiple attributes jointly improve per-class optimization and robustness. CAP's flexible, attribute-driven approach offers practical gains for fairness objectives beyond standard metrics and can extend to other personalization tasks like data augmentation and regularization.

Abstract

Modern classification problems exhibit heterogeneities across individual classes: Each class may have unique attributes, such as sample size, label quality, or predictability (easy vs difficult), and variable importance at test-time. Without care, these heterogeneities impede the learning process, most notably, when optimizing fairness objectives. Confirming this, under a gaussian mixture setting, we show that the optimal SVM classifier for balanced accuracy needs to be adaptive to the class attributes. This motivates us to propose CAP: An effective and general method that generates a class-specific learning strategy (e.g. hyperparameter) based on the attributes of that class. This way, optimization process better adapts to heterogeneities. CAP leads to substantial improvements over the naive approach of assigning separate hyperparameters to each class. We instantiate CAP for loss function design and post-hoc logit adjustment, with emphasis on label-imbalanced problems. We show that CAP is competitive with prior art and its flexibility unlocks clear benefits for fairness objectives beyond balanced accuracy. Finally, we evaluate CAP on problems with label noise as well as weighted test objectives to showcase how CAP can jointly adapt to different heterogeneities.
Paper Structure (19 sections, 2 theorems, 21 equations, 8 figures, 6 tables)

This paper contains 19 sections, 2 theorems, 21 equations, 8 figures, 6 tables.

Key Result

Theorem 1

For any $\delta\in\mathbb{R}_{+}^K$, the weighted pairwise loss (eqn:weighted) is Fisher consistent with weights and margins

Figures (8)

  • Figure 1: Left hand side:CAP views the global dataset as a composition of heterogeneous sub-datasets induced by classes. We extract high-level attributes from these classes and use these attributes to generate class-specific optimization strategies (which correspond to hyperparameters). Our proposal is efficiently generating these hyperparameters based on class-attributes through a meta-strategy. Right hand side: We demonstrate that CAP leads to state-of-the-art strategies for loss function design and post-hoc optimization. CAP can leverage multiple attributes to flexibly optimize a variety of test objectives under heterogeneities.
  • Figure 2: The optimal hyperparameter $\delta_*$ depends on both attributes: frequency ($\pi$) and difficulty ($\sigma_+/\sigma_-$).
  • Figure 3: Benefit of $\texttt{CAP}\xspace$ for optimizing different Fairness Objectives. We compare among plain post-hoc, LA post-hoc and $\texttt{CAP}\xspace_{\text{post-hoc}}$. (a): Results of optimizing quantile class performance ${\text{Quant}_{a}} = \mathbb{P}\left[y \neq \hat{y}_{f}(\boldsymbol{x})\mid y=\mathrm{K}_{a}\right]$, where $\mathrm{K}_{a}$ denotes the class index with the worst $\lceil\mathrm{K}\times{a}\rceil$-th error. (b): Results of optimizing tail performance ${\text{CVaR}_{a}}$. (c): Results of optimizing ${\mathcal{R}({\text{Err}})}= \lambda\cdot{{\text{Err}}}_{\text{plain}}+(1-\lambda)\cdot{\text{Err}_{\text{SDev}}}$. The plot shows the trade-off between standard deviation of class-conditional errors ${\text{Err}_{\text{SDev}}}$ and Standard misclassification error ${{\text{Err}}}_{\text{plain}}$ as $\lambda$ varies. See Sec.\ref{['sec:objective']} for detailed definition and discussions.
  • Figure 4: The overview of $\texttt{CAP}\xspace$ approach. CAP is the overall framework proposed in our paper, with $\textbf{A2H}$ being the core algorithm. $\textbf{A2H}$ is a meta-strategy that transforms the class-attribute prior knowledge into hyper-parameter $\bm{\mathcal{S}}$ for each class through a trainable matrix $\mathbf{W}$, forming a training strategy that satisfies the desired fairness objective. The left half of the figure specifically illustrates how our algorithm calculates and trains the weights. In the first stage, we collect class-related information and construct an attribute table of $n \times K$ dimension. This is a general prior, which is related to the distribution of training data, the training difficulty of each class, and other factors. Then, he first step of $\textbf{A2H}$ is to compute a $K\times M$ Feature Dictionary $\bm{{\cal{D}}}=\bm{\mathcal{F}}(\bm{\mathcal{A}})$ by applying a set of functions $\bm{\mathcal{F}}$. We remark that $M << K$ and $M$ is only related to the number of attributes $n$ and $|\bm{\mathcal{F}}|$, making it a constant. Therefore, the search space is $\mathcal{O}(1)$. Then, in the second step, the weight matrix $\mathbf{W}$ is trained through bi-level or post-hoc methods to construct the hyperparameter $\bm{\mathcal{S}}$.
  • Figure 5: CAP framework for detailed implementation. This figure illustrates how CAP is implemented under bi-level optimization and post-hoc optimization. Throughout the entire figure, the only trainable parameters are $\textbf{W}$ and the network (in the green box). In the search phase of bilevel optimization, we first conduct an 80-20% train-val split. Then, we train the network with parametric loss function for inner optimization on 80% training dataset and train $\textbf{W}$ to achieve fairness objective for outer optimization on 20% validation dataset. And in post-hoc implementation, we first train the network without hyperparameters on the training dataset and do the post-hoc optimization on the validation set. Both bilevel and post-hoc yield optimal fairness weight $\textbf{W}^*$, for bi-level and post-hoc transferring, we use the optimal $\mathbf{W}^*$ to retrain a fairness-focused model on the entire training dataset. If only post-hoc adjustments are conducted, we directly modify the pre-trained model's logit with a post-hoc function.
  • ...and 3 more figures

Theorems & Definitions (4)

  • Theorem 1
  • proof
  • Corollary 1.1
  • proof