Table of Contents
Fetching ...

Human Behavior Atlas: Benchmarking Unified Psychological and Social Behavior Understanding

Keane Ong, Wei Dai, Carol Li, Dewei Feng, Hengzhi Li, Jingyao Wu, Jiaee Cheong, Rui Mao, Gianmarco Mengaldo, Erik Cambria, Paul Pu Liang

TL;DR

The paper addresses fragmentation in psychological and social behavior understanding by creating a unified, large-scale multimodal benchmark. It introduces Human Behavior Atlas, which standardizes taxonomy, data formats, evaluation, and augments data with behavioral descriptors to enable unified learning across tasks. Training three vanilla OmniSapiens-7B variants (SFT, BAM, RL) on the atlas yields superior performance relative to existing multimodal LLMs in diverse tasks, with RL particularly strong on open-ended generation; pretraining also improves transfer to held-out and novel tasks. The work demonstrates that structured benchmarks plus descriptor-based adaptations can meaningfully advance unified models for complex human behaviors and provides methodological guidance for future atlas construction.

Abstract

Using intelligent systems to perceive psychological and social behaviors, that is, the underlying affective, cognitive, and pathological states that are manifested through observable behaviors and social interactions, remains a challenge due to their complex, multifaceted, and personalized nature. Existing work tackling these dimensions through specialized datasets and single-task systems often miss opportunities for scalability, cross-task transfer, and broader generalization. To address this gap, we curate Human Behavior Atlas, a unified benchmark of diverse behavioral tasks designed to support the development of unified models for understanding psychological and social behaviors. Human Behavior Atlas comprises over 100,000 samples spanning text, audio, and visual modalities, covering tasks on affective states, cognitive states, pathologies, and social processes. Our unification efforts can reduce redundancy and cost, enable training to scale efficiently across tasks, and enhance generalization of behavioral features across domains. On Human Behavior Atlas, we train three models: OmniSapiens-7B SFT, OmniSapiens-7B BAM, and OmniSapiens-7B RL. We show that training on Human Behavior Atlas enables models to consistently outperform existing multimodal LLMs across diverse behavioral tasks. Pretraining on Human Behavior Atlas also improves transfer to novel behavioral datasets; with the targeted use of behavioral descriptors yielding meaningful performance gains.

Human Behavior Atlas: Benchmarking Unified Psychological and Social Behavior Understanding

TL;DR

The paper addresses fragmentation in psychological and social behavior understanding by creating a unified, large-scale multimodal benchmark. It introduces Human Behavior Atlas, which standardizes taxonomy, data formats, evaluation, and augments data with behavioral descriptors to enable unified learning across tasks. Training three vanilla OmniSapiens-7B variants (SFT, BAM, RL) on the atlas yields superior performance relative to existing multimodal LLMs in diverse tasks, with RL particularly strong on open-ended generation; pretraining also improves transfer to held-out and novel tasks. The work demonstrates that structured benchmarks plus descriptor-based adaptations can meaningfully advance unified models for complex human behaviors and provides methodological guidance for future atlas construction.

Abstract

Using intelligent systems to perceive psychological and social behaviors, that is, the underlying affective, cognitive, and pathological states that are manifested through observable behaviors and social interactions, remains a challenge due to their complex, multifaceted, and personalized nature. Existing work tackling these dimensions through specialized datasets and single-task systems often miss opportunities for scalability, cross-task transfer, and broader generalization. To address this gap, we curate Human Behavior Atlas, a unified benchmark of diverse behavioral tasks designed to support the development of unified models for understanding psychological and social behaviors. Human Behavior Atlas comprises over 100,000 samples spanning text, audio, and visual modalities, covering tasks on affective states, cognitive states, pathologies, and social processes. Our unification efforts can reduce redundancy and cost, enable training to scale efficiently across tasks, and enhance generalization of behavioral features across domains. On Human Behavior Atlas, we train three models: OmniSapiens-7B SFT, OmniSapiens-7B BAM, and OmniSapiens-7B RL. We show that training on Human Behavior Atlas enables models to consistently outperform existing multimodal LLMs across diverse behavioral tasks. Pretraining on Human Behavior Atlas also improves transfer to novel behavioral datasets; with the targeted use of behavioral descriptors yielding meaningful performance gains.

Paper Structure

This paper contains 27 sections, 22 equations, 4 figures, 9 tables.

Figures (4)

  • Figure 1: Overview of Human Behavior Atlas. (a) Selection criteria and preprocessing pipeline of datasets. (b) Dataset distribution across 10 behavior related tasks. Inner circle indicates the modality combination of the input data, where T=Text, A=Audio and V=Video. Middle ring describes the tasks of the dataset, as defined in Sec. \ref{['sec:tax_datasets']}. The outer ring and bars lists the datasets and its sample sizes respectively. (c) Distribution of data modalities. Our dataset has a focus on video understanding as it comprises both vision and audio modalities, with 83.6% of samples containing video data. (d) Distribution of sample durations. Both short and long videos/audio tasks are covered, with 29.2% of video/audio clips lasting more than 20 seconds. (e) Source of datasets. Datasets are sourced from diverse geographic regions across North America, Europe and Asia.
  • Figure 2: Multitask results across tasks for each model. Each result reports the average score across all datasets for that task. Best to worst = dark green $\rightarrow$ yellow $\rightarrow$ dark red. Upon training on Human Behavior Atlas, OmniSapiens-7BSFT & RL outperform existing pretrained models across most behavioral tasks.
  • Figure 3: Example from MUStARD where the speaker (Chandler, from Friends) sarcastically suggests putting up balcony lights. While Qwen2.5-Omni-7B predicts no sarcasm, OmniSapiens-7BSFT correctly identifies the instance as sarcasm.
  • Figure 4: Example from the CH-SIMSv2 dataset where the speaker briefly displays a split-second smile, signaling positive sentiment. While OmniSapiens-7BSFT misses the subtle cue and predicts negative sentiment, OmniSapiens-7BBAM correctly predicts positive.