Exploring Fusion Techniques in Multimodal AI-Based Recruitment: Insights from FairCVdb

Swati Swati; Arjun Roy; Eirini Ntoutsi

Exploring Fusion Techniques in Multimodal AI-Based Recruitment: Insights from FairCVdb

Swati Swati, Arjun Roy, Eirini Ntoutsi

TL;DR

This study investigates fairness and bias in multimodal AI-based recruitment by comparing early- and late-fusion strategies using the FairCVdb dataset, which contains text, visual, and tabular modalities with synthetic gender and ethnicity biases. It evaluates accuracy with $MAE$ and demographic bias with $KL$-divergence under neutral and biased conditions, revealing that early- fusion more accurately mirrors ground-truth distributions and achieves lower $MAE$ across demographics. In contrast, late-fusion tends to produce over-generalized predictions and higher $MAE$, with the visual modality showing the strongest biases in biased scenarios. The findings suggest prioritizing early- fusion for fair and accurate multimodal hiring systems and highlight mid-fusion as a promising future direction, while calling for broader validation across datasets and ethical considerations.$

Abstract

Despite the large body of work on fairness-aware learning for individual modalities like tabular data, images, and text, less work has been done on multimodal data, which fuses various modalities for a comprehensive analysis. In this work, we investigate the fairness and bias implications of multimodal fusion techniques in the context of multimodal AI-based recruitment systems using the FairCVdb dataset. Our results show that early-fusion closely matches the ground truth for both demographics, achieving the lowest MAEs by integrating each modality's unique characteristics. In contrast, late-fusion leads to highly generalized mean scores and higher MAEs. Our findings emphasise the significant potential of early-fusion for accurate and fair applications, even in the presence of demographic biases, compared to late-fusion. Future research could explore alternative fusion strategies and incorporate modality-related fairness constraints to improve fairness. For code and additional insights, visit: https://github.com/Swati17293/Multimodal-AI-Based-Recruitment-FairCVdb

Exploring Fusion Techniques in Multimodal AI-Based Recruitment: Insights from FairCVdb

TL;DR

and demographic bias with

-divergence under neutral and biased conditions, revealing that early- fusion more accurately mirrors ground-truth distributions and achieves lower

across demographics. In contrast, late-fusion tends to produce over-generalized predictions and higher

, with the visual modality showing the strongest biases in biased scenarios. The findings suggest prioritizing early- fusion for fair and accurate multimodal hiring systems and highlight mid-fusion as a promising future direction, while calling for broader validation across datasets and ethical considerations.$

Abstract

Paper Structure (4 sections, 1 figure)

This paper contains 4 sections, 1 figure.

Introduction
Experimental Setup
Evaluation Results
Conclusions

Figures (1)

Figure 1: KL-divergence between score distributions across Gender and Ethnicity demographics for different modalities and bias setups. Lower KL and MAE scores are better.

Exploring Fusion Techniques in Multimodal AI-Based Recruitment: Insights from FairCVdb

TL;DR

Abstract

Exploring Fusion Techniques in Multimodal AI-Based Recruitment: Insights from FairCVdb

Authors

TL;DR

Abstract

Table of Contents

Figures (1)