Table of Contents
Fetching ...

Social Intelligence Data Infrastructure: Structuring the Present and Navigating the Future

Minzhi Li, Weiyan Shi, Caleb Ziems, Diyi Yang

TL;DR

A Social AI Data Infrastructure is built, which consists of a comprehensive social AI taxonomy and a data library of 480 NLP datasets and demonstrates its utility in enabling a thorough understanding of current data landscape and providing a holistic perspective on potential directions for future dataset development.

Abstract

As Natural Language Processing (NLP) systems become increasingly integrated into human social life, these technologies will need to increasingly rely on social intelligence. Although there are many valuable datasets that benchmark isolated dimensions of social intelligence, there does not yet exist any body of work to join these threads into a cohesive subfield in which researchers can quickly identify research gaps and future directions. Towards this goal, we build a Social AI Data Infrastructure, which consists of a comprehensive social AI taxonomy and a data library of 480 NLP datasets. Our infrastructure allows us to analyze existing dataset efforts, and also evaluate language models' performance in different social intelligence aspects. Our analyses demonstrate its utility in enabling a thorough understanding of current data landscape and providing a holistic perspective on potential directions for future dataset development. We show there is a need for multifaceted datasets, increased diversity in language and culture, more long-tailed social situations, and more interactive data in future social intelligence data efforts.

Social Intelligence Data Infrastructure: Structuring the Present and Navigating the Future

TL;DR

A Social AI Data Infrastructure is built, which consists of a comprehensive social AI taxonomy and a data library of 480 NLP datasets and demonstrates its utility in enabling a thorough understanding of current data landscape and providing a holistic perspective on potential directions for future dataset development.

Abstract

As Natural Language Processing (NLP) systems become increasingly integrated into human social life, these technologies will need to increasingly rely on social intelligence. Although there are many valuable datasets that benchmark isolated dimensions of social intelligence, there does not yet exist any body of work to join these threads into a cohesive subfield in which researchers can quickly identify research gaps and future directions. Towards this goal, we build a Social AI Data Infrastructure, which consists of a comprehensive social AI taxonomy and a data library of 480 NLP datasets. Our infrastructure allows us to analyze existing dataset efforts, and also evaluate language models' performance in different social intelligence aspects. Our analyses demonstrate its utility in enabling a thorough understanding of current data landscape and providing a holistic perspective on potential directions for future dataset development. We show there is a need for multifaceted datasets, increased diversity in language and culture, more long-tailed social situations, and more interactive data in future social intelligence data efforts.
Paper Structure (43 sections, 10 figures, 3 tables)

This paper contains 43 sections, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Our Social Intelligence Data Infrastructure gives a comprehensive overview and synthesis of social intelligence in NLP, with a theoretically grounded taxonomy and an NLP data library. Researchers can use our infrastructure to build and organize tasks, evaluate language models and derive future insights.
  • Figure 2: Social AI taxonomy with three pillars: cognitive, situational and behavioral intelligence. We illustrate their respective roles in social interactions (left), and visualize their definitions and example NLP tasks (right).
  • Figure 3: Distribution of three intelligence types (left) and frequency of different subcategories within cognitive, situational and behavioral intelligence (right).
  • Figure 4: NLP tasks related to social intelligence over time. We show newly emerged topics based on the NLP Task field in our constructed data library for every three years. This is a non-exhaustive visualization (if number of distinct new topics for the period is more than three, we cap at three).
  • Figure 5: Number of papers with interactive or static data. We also visualize a breakdown of interactive data into dyadic and multi-party interactions.
  • ...and 5 more figures