Face-voice Association in Multilingual Environments (FAME) 2026 Challenge Evaluation Plan

Marta Moscati; Ahmed Abdullah; Muhammad Saad Saeed; Shah Nawaz; Rohan Kumar Das; Muhammad Zaigham Zaheer; Junaid Mir; Muhammad Haroon Yousaf; Khalid Malik; Markus Schedl

Face-voice Association in Multilingual Environments (FAME) 2026 Challenge Evaluation Plan

Marta Moscati, Ahmed Abdullah, Muhammad Saad Saeed, Shah Nawaz, Rohan Kumar Das, Muhammad Zaigham Zaheer, Junaid Mir, Muhammad Haroon Yousaf, Khalid Malik, Markus Schedl

TL;DR

The paper addresses face-voice association in multilingual environments by introducing the FAME 2026 Challenge and the MAV-Celeb dataset, enabling cross-language verification of face-voice pairs. It adopts a baseline two-branch multimodal model with face and voice encoders and a gated fusion layer, optimized with $L_{CE}$ and $L_{OC}$ losses, evaluated via equal error rate (EER) across heard and unheard languages. Key contributions include the expanded multilingual dataset, the progress/evaluation protocol with V1-EU and V3-EG splits, and the emphasis on language transfer effects for cross-modal matching. The practical impact lies in guiding the development of robust, language-agnostic face-voice systems for real-world multilingual settings; the overall scoring aggregates EERs as $\text{Overall Score} = \frac{\sum \text{EERs}}{4}$ to benchmark participants.

Abstract

The advancements of technology have led to the use of multimodal systems in various real-world applications. Among them, audio-visual systems are among the most widely used multimodal systems. In the recent years, associating face and voice of a person has gained attention due to the presence of unique correlation between them. The Face-voice Association in Multilingual Environments (FAME) 2026 Challenge focuses on exploring face-voice association under the unique condition of a multilingual scenario. This condition is inspired from the fact that half of the world's population is bilingual and most often people communicate under multilingual scenarios. The challenge uses a dataset named Multilingual Audio-Visual (MAV-Celeb) for exploring face-voice association in multilingual environments. This report provides the details of the challenge, dataset, baseline models, and task details for the FAME Challenge.

Face-voice Association in Multilingual Environments (FAME) 2026 Challenge Evaluation Plan

TL;DR

Abstract

Face-voice Association in Multilingual Environments (FAME) 2026 Challenge Evaluation Plan

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)