Table of Contents
Fetching ...

RecruitView: A Multimodal Dataset for Predicting Personality and Interview Performance for Human Resources Applications

Amit Kumar Gupta, Farhan Sheth, Hammad Shaikh, Dheeraj Kumar, Angkul Puniya, Deepak Panwar, Sandeep Chaurasia, Priya Mathur

TL;DR

RecruitView provides a large, in-the-wild multimodal interview dataset with psychometrically grounded, continuous labels for 12 personality and interview-performance targets. The authors introduce CRMF, a geometry-aware fusion framework that processes visual, audio, and text signals across hyperbolic, spherical, and Euclidean manifolds with adaptive routing and tangent-space fusion. Empirical results show CRMF consistently outperforms strong large multimodal baselines while using substantially fewer trainable parameters, highlighting the value of manifold-aware representations for behavioral prediction. The work advances multimodal behavioral analysis by integrating multi-geometry inductive biases, enabling more reliable personality and interview-performance assessment in HR contexts, and provides publicly available data and code for reproducible research.

Abstract

Automated personality and soft skill assessment from multimodal behavioral data remains challenging due to limited datasets and methods that fail to capture geometric structure inherent in human traits. We introduce RecruitView, a dataset of 2,011 naturalistic video interview clips from 300+ participants with 27,000 pairwise comparative judgments across 12 dimensions: Big Five personality traits, overall personality score, and six interview performance metrics. To leverage this data, we propose Cross-Modal Regression with Manifold Fusion (CRMF), a geometric deep learning framework that explicitly models behavioral representations across hyperbolic, spherical, and Euclidean manifolds. CRMF employs geometry-specific expert networks to capture hierarchical trait structures, directional behavioral patterns, and continuous performance variations simultaneously. An adaptive routing mechanism dynamically weights expert contributions based on input characteristics. Through principled tangent space fusion, CRMF achieves superior performance while training 40-50% fewer trainable parameters than large multimodal models. Extensive experiments demonstrate that CRMF substantially outperforms the selected baselines, achieving up to 11.4% improvement in Spearman correlation and 6.0% in concordance index. Our RecruitView dataset is publicly available at https://huggingface.co/datasets/AI4A-lab/RecruitView

RecruitView: A Multimodal Dataset for Predicting Personality and Interview Performance for Human Resources Applications

TL;DR

RecruitView provides a large, in-the-wild multimodal interview dataset with psychometrically grounded, continuous labels for 12 personality and interview-performance targets. The authors introduce CRMF, a geometry-aware fusion framework that processes visual, audio, and text signals across hyperbolic, spherical, and Euclidean manifolds with adaptive routing and tangent-space fusion. Empirical results show CRMF consistently outperforms strong large multimodal baselines while using substantially fewer trainable parameters, highlighting the value of manifold-aware representations for behavioral prediction. The work advances multimodal behavioral analysis by integrating multi-geometry inductive biases, enabling more reliable personality and interview-performance assessment in HR contexts, and provides publicly available data and code for reproducible research.

Abstract

Automated personality and soft skill assessment from multimodal behavioral data remains challenging due to limited datasets and methods that fail to capture geometric structure inherent in human traits. We introduce RecruitView, a dataset of 2,011 naturalistic video interview clips from 300+ participants with 27,000 pairwise comparative judgments across 12 dimensions: Big Five personality traits, overall personality score, and six interview performance metrics. To leverage this data, we propose Cross-Modal Regression with Manifold Fusion (CRMF), a geometric deep learning framework that explicitly models behavioral representations across hyperbolic, spherical, and Euclidean manifolds. CRMF employs geometry-specific expert networks to capture hierarchical trait structures, directional behavioral patterns, and continuous performance variations simultaneously. An adaptive routing mechanism dynamically weights expert contributions based on input characteristics. Through principled tangent space fusion, CRMF achieves superior performance while training 40-50% fewer trainable parameters than large multimodal models. Extensive experiments demonstrate that CRMF substantially outperforms the selected baselines, achieving up to 11.4% improvement in Spearman correlation and 6.0% in concordance index. Our RecruitView dataset is publicly available at https://huggingface.co/datasets/AI4A-lab/RecruitView

Paper Structure

This paper contains 81 sections, 33 equations, 10 figures, 10 tables.

Figures (10)

  • Figure 1: Data distribution by duration and video quality.
  • Figure 2: Data distribution by durations and transcript word count.
  • Figure 3: Spearman correlation between various metrics.
  • Figure 4: Overview of the CRMF architecture. Multimodal encoders extract features from video, audio, and text. Pre-fusion integrates modalities through cross-modal attention. The manifold projection layer maps features to hyperbolic, spherical, and Euclidean spaces. Geometry-specific experts process each manifold representation with intra-manifold attention. A learned router dynamically weights expert outputs. Finally, geometric fusion combines representations in a shared tangent space for multi-target prediction.
  • Figure 5: The participant-facing QAVideoShare data collection platform. (Left) The secure login and consent portal. (Right) The primary video recording interface where participants view the prompt and record their response.
  • ...and 5 more figures