Leveraging Taxonomy and LLMs for Improved Multimodal Hierarchical Classification

Shijing Chen; Mohamed Reda Bouadjenek; Shoaib Jameel; Usman Naseem; Basem Suleiman; Flora D. Salim; Hakim Hacid; Imran Razzak

Leveraging Taxonomy and LLMs for Improved Multimodal Hierarchical Classification

Shijing Chen, Mohamed Reda Bouadjenek, Shoaib Jameel, Usman Naseem, Basem Suleiman, Flora D. Salim, Hakim Hacid, Imran Razzak

TL;DR

The paper tackles multi-level hierarchical classification (MLHC) with deep taxonomies by addressing inconsistencies that arise when using flat or locally trained classifiers. It introduces a taxonomy-based transitional classifier (TTC) that is backbone-agnostic and uses a transition matrix $M^{[\ell_i,\ell_{i+1}]}$ to enforce hierarchical consistency across levels, with per-level logits $z^{[\ell_i]} = W^{[\ell_i]} a + b^{[\ell_i]}$ and $m^{[\ell_{i+1}]} = \hat{y}^{[\ell_i]} \times M^{[\ell_i,\ell_{i+1}]}$. Empirically, TTC improves consistency, exact match, and $\ell_3$ accuracy across diverse multimodal LLM backbones on the MEP-3M dataset, though HF1-Score can decline slightly in some cases. These results demonstrate the practicality of a model-agnostic, taxonomy-aware classifier for multimodal MLHC and suggest broader applicability to hierarchical and standard classification tasks.

Abstract

Multi-level Hierarchical Classification (MLHC) tackles the challenge of categorizing items within a complex, multi-layered class structure. However, traditional MLHC classifiers often rely on a backbone model with independent output layers, which tend to ignore the hierarchical relationships between classes. This oversight can lead to inconsistent predictions that violate the underlying taxonomy. Leveraging Large Language Models (LLMs), we propose a novel taxonomy-embedded transitional LLM-agnostic framework for multimodality classification. The cornerstone of this advancement is the ability of models to enforce consistency across hierarchical levels. Our evaluations on the MEP-3M dataset - a multi-modal e-commerce product dataset with various hierarchical levels - demonstrated a significant performance improvement compared to conventional LLM structures.

Leveraging Taxonomy and LLMs for Improved Multimodal Hierarchical Classification

TL;DR

to enforce hierarchical consistency across levels, with per-level logits

and

. Empirically, TTC improves consistency, exact match, and

accuracy across diverse multimodal LLM backbones on the MEP-3M dataset, though HF1-Score can decline slightly in some cases. These results demonstrate the practicality of a model-agnostic, taxonomy-aware classifier for multimodal MLHC and suggest broader applicability to hierarchical and standard classification tasks.

Abstract

Paper Structure (10 sections, 5 equations, 7 figures, 2 tables)

This paper contains 10 sections, 5 equations, 7 figures, 2 tables.

Introduction
Related Work
Taxonomy-based Transitional Classifier
Notation and problem definition
TTC Model Description
Experiments and Results
Experimental Details
Experimental Results
Conclusion
The supplementary results for experiments

Figures (7)

Figure 1: (a) An data point of an "Apple" classified by three independent classifiers as a "Food" and "Fruit" (correct for levels $\ell_1$ and $\ell_2$), but incorrectly classified as a "Pearl" at level $\ell_3$. The correct classifications at levels $\ell_1$ and $\ell_2$ could have assisted in identifying the correct class for $\ell_3$. (b) The proportion of correctly classified data entries at each level of the taxonomy for the sampled MEP-3M dataset (shown in dark color), along with the proportion of data entries misclassified at one level but correctly identified at other levels (shown in light color). This highlights the potential advantage of using a multi-level hierarchical classifier.
Figure 2: Architecture diagram of the taxonomy-based transitional classifier. The transitional matrix $M^{[\ell_n, \ell_{n+1}]}$ multiplies the output from the corresponding classifier to obtain an attention score, which is then applied to the output of the next-level classifier. This ensures that the information from the upper level and subclass predictions is integrated into the output, increasing the likelihood of maintaining consistency.
Figure 3: The distribution of data across all classes at $\ell_2$ and $\ell_3$
Figure 4: Spider diagram for all LLM backbones regarding 5 different metrics.
Figure 5: Group bar chart for detailed results for all LLM backbones on all 5 metrics.
...and 2 more figures

Leveraging Taxonomy and LLMs for Improved Multimodal Hierarchical Classification

TL;DR

Abstract

Leveraging Taxonomy and LLMs for Improved Multimodal Hierarchical Classification

Authors

TL;DR

Abstract

Table of Contents

Figures (7)