Table of Contents
Fetching ...

The Computational Anatomy of Humility: Modeling Intellectual Humility in Online Public Discourse

Xiaobo Guo, Neil Potnis, Melody Yu, Nabeel Gillani, Soroush Vosoughi

TL;DR

This study explores the development of computational methods for measuring IH, or acknowledging the potential limitations in one’s own beliefs, and manually curate and validate an IH codebook on 350 posts about religion drawn from subreddits and develops LLM-based models for automating this measurement.

Abstract

The ability for individuals to constructively engage with one another across lines of difference is a critical feature of a healthy pluralistic society. This is also true in online discussion spaces like social media platforms. To date, much social media research has focused on preventing ills -- like political polarization and the spread of misinformation. While this is important, enhancing the quality of online public discourse requires not just reducing ills but also promoting foundational human virtues. In this study, we focus on one particular virtue: ``intellectual humility'' (IH), or acknowledging the potential limitations in one's own beliefs. Specifically, we explore the development of computational methods for measuring IH at scale. We manually curate and validate an IH codebook on 350 posts about religion drawn from subreddits and use them to develop LLM-based models for automating this measurement. Our best model achieves a Macro-F1 score of 0.64 across labels (and 0.70 when predicting IH/IA/Neutral at the coarse level), higher than an expected naive baseline of 0.51 (0.32 for IH/IA/Neutral) but lower than a human annotator-informed upper bound of 0.85 (0.83 for IH/IA/Neutral). Our results both highlight the challenging nature of detecting IH online -- opening the door to new directions in NLP research -- and also lay a foundation for computational social science researchers interested in analyzing and fostering more IH in online public discourse.

The Computational Anatomy of Humility: Modeling Intellectual Humility in Online Public Discourse

TL;DR

This study explores the development of computational methods for measuring IH, or acknowledging the potential limitations in one’s own beliefs, and manually curate and validate an IH codebook on 350 posts about religion drawn from subreddits and develops LLM-based models for automating this measurement.

Abstract

The ability for individuals to constructively engage with one another across lines of difference is a critical feature of a healthy pluralistic society. This is also true in online discussion spaces like social media platforms. To date, much social media research has focused on preventing ills -- like political polarization and the spread of misinformation. While this is important, enhancing the quality of online public discourse requires not just reducing ills but also promoting foundational human virtues. In this study, we focus on one particular virtue: ``intellectual humility'' (IH), or acknowledging the potential limitations in one's own beliefs. Specifically, we explore the development of computational methods for measuring IH at scale. We manually curate and validate an IH codebook on 350 posts about religion drawn from subreddits and use them to develop LLM-based models for automating this measurement. Our best model achieves a Macro-F1 score of 0.64 across labels (and 0.70 when predicting IH/IA/Neutral at the coarse level), higher than an expected naive baseline of 0.51 (0.32 for IH/IA/Neutral) but lower than a human annotator-informed upper bound of 0.85 (0.83 for IH/IA/Neutral). Our results both highlight the challenging nature of detecting IH online -- opening the door to new directions in NLP research -- and also lay a foundation for computational social science researchers interested in analyzing and fostering more IH in online public discourse.

Paper Structure

This paper contains 43 sections, 5 figures, 8 tables.

Figures (5)

  • Figure 1: The flow chart for developing our IH Codebook
  • Figure 2: Manually identifying and eliminating similar labels for broader terms. The terms highlighted in green were then added to the first iteration of the codebook. The terms highlighted in red were the ones eliminated.
  • Figure 3: Comparison between different boosting methods and the human annotator upper bound; negative values indicate performance below the upper bound. "Original" refers to the results without any boosting.
  • Figure C1: The original and optimized system prompts for BQ settings of the code "Recognizes limitations in one’s knowledge or beliefs"
  • Figure E2: Two samples generated by GPT-4-turbo-2024-04-09 with the Chain-of-thought Settings. The first one is correct, and the second one is not correct