Table of Contents
Fetching ...

Enforcing Consistency and Fairness in Multi-level Hierarchical Classification with a Mask-based Output Layer

Shijing Chen, Shoaib Jameel, Mohamed Reda Bouadjenek, Feilong Tang, Usman Naseem, Basem Suleiman, Hakim Hacid, Flora D. Salim, Imran Razzak

TL;DR

The paper tackles inconsistent and biased predictions in multi-level hierarchical classification by introducing a model-agnostic mask-based Debiased Taxonomy-based Transitional Classifier (D-TTC). It builds a Taxonomy-based Transitional Classifier (TTC) layer that uses a transition matrix $M^{[\ell_i,\ell_{i+1}]}$ to propagate higher-level predictions to lower levels and applies a top-down attention-like mechanism, complemented by dynamic reweighting to achieve Equalized Odds. Evaluations on Amazon Product Review and DBPedia across multiple LLM backbones show improvements in fairness ($EO$), consistency, and exact-match accuracy, with some trade-offs in $HF1$; Pareto analyses highlight strong balance candidates such as Llama-2-7B(HD). The proposed masking layer offers a modular, taxonomy-integrated approach that enhances reliability in sensitive domains (e.g., e-commerce, healthcare, education) and suggests directions for incorporating bottom-up signals and parallel processing to enable real-time deployment.

Abstract

Traditional Multi-level Hierarchical Classification (MLHC) classifiers often rely on backbone models with $n$ independent output layers. This structure tends to overlook the hierarchical relationships between classes, leading to inconsistent predictions that violate the underlying taxonomy. Additionally, once a backbone architecture for an MLHC classifier is selected, adapting the model to accommodate new tasks can be challenging. For example, incorporating fairness to protect sensitive attributes within a hierarchical classifier necessitates complex adjustments to maintain the class hierarchy while enforcing fairness constraints. In this paper, we extend this concept to hierarchical classification by introducing a fair, model-agnostic layer designed to enforce taxonomy and optimize specific objectives, including consistency, fairness, and exact match. Our evaluations demonstrate that the proposed layer not only improves the fairness of predictions but also enforces the taxonomy, resulting in consistent predictions and superior performance. Compared to Large Language Models (LLMs) employing in-processing de-biasing techniques and models without any bias correction, our approach achieves better outcomes in both fairness and accuracy, making it particularly valuable in sectors like e-commerce, healthcare, and education, where predictive reliability is crucial.

Enforcing Consistency and Fairness in Multi-level Hierarchical Classification with a Mask-based Output Layer

TL;DR

The paper tackles inconsistent and biased predictions in multi-level hierarchical classification by introducing a model-agnostic mask-based Debiased Taxonomy-based Transitional Classifier (D-TTC). It builds a Taxonomy-based Transitional Classifier (TTC) layer that uses a transition matrix to propagate higher-level predictions to lower levels and applies a top-down attention-like mechanism, complemented by dynamic reweighting to achieve Equalized Odds. Evaluations on Amazon Product Review and DBPedia across multiple LLM backbones show improvements in fairness (), consistency, and exact-match accuracy, with some trade-offs in ; Pareto analyses highlight strong balance candidates such as Llama-2-7B(HD). The proposed masking layer offers a modular, taxonomy-integrated approach that enhances reliability in sensitive domains (e.g., e-commerce, healthcare, education) and suggests directions for incorporating bottom-up signals and parallel processing to enable real-time deployment.

Abstract

Traditional Multi-level Hierarchical Classification (MLHC) classifiers often rely on backbone models with independent output layers. This structure tends to overlook the hierarchical relationships between classes, leading to inconsistent predictions that violate the underlying taxonomy. Additionally, once a backbone architecture for an MLHC classifier is selected, adapting the model to accommodate new tasks can be challenging. For example, incorporating fairness to protect sensitive attributes within a hierarchical classifier necessitates complex adjustments to maintain the class hierarchy while enforcing fairness constraints. In this paper, we extend this concept to hierarchical classification by introducing a fair, model-agnostic layer designed to enforce taxonomy and optimize specific objectives, including consistency, fairness, and exact match. Our evaluations demonstrate that the proposed layer not only improves the fairness of predictions but also enforces the taxonomy, resulting in consistent predictions and superior performance. Compared to Large Language Models (LLMs) employing in-processing de-biasing techniques and models without any bias correction, our approach achieves better outcomes in both fairness and accuracy, making it particularly valuable in sectors like e-commerce, healthcare, and education, where predictive reliability is crucial.

Paper Structure

This paper contains 13 sections, 6 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: (a) Example of an Amazon product review classified across multiple levels of the taxonomy. (b) Proportion of correctly classified product reviews for each level of our taxonomy of the Amazon product review dataset, and the proportion of reviews incorrectly classified but for which other levels in the taxonomy were correctly identified. (c) Performance difference between male and female predictions using the BERT + Flat classifier model on the Amazon product review dataset. The percentages highlighted are actual accuracy differences between different genders.
  • Figure 2: Architecture for Debiased-TTC model layers.
  • Figure 3: Performance metrics comparison for various models and variants across different evaluation measures. The plots on the top row show metrics where higher values indicate better performance (HF1, Consistency, and Exact Match), whereas the plots on the bottom row (EO@$\ell_1$, EO@$\ell_2$, EO@$\ell_3$) display metrics where lower values are desirable for indicating fairness. The bars for each metric are grouped by model variant, with colors indicating different configurations (Base, D, H, HD). Note the distinct y-axis scales for fairness metrics (EO), highlighting differences in the fairness evaluation across models.
  • Figure 4: Trade-offs analysis between the HF1 score and Average EO for DBPedia dataset.
  • Figure 5: The gender distribution of two datasets.
  • ...and 5 more figures