Table of Contents
Fetching ...

The Value of Nothing: Multimodal Extraction of Human Values Expressed by TikTok Influencers

Alina Starovolsky-Shitrit, Alon Neduva, Naama Appel Doron, Ella Daniel, Oren Tsur

TL;DR

The paper tackles how Schwartz's Personal Values are expressed and transmitted by TikTok influencers targeting youth. It compares a direct multimodal extraction approach to a 2-step video-to-script pipeline, finding that a trainable MLM used in the second step yields the strongest results, surpassing few-shot LLM performance. A 890-video, manually annotated dataset is introduced, enabling robust evaluation and advancing multimodal value detection in short-form video. The work demonstrates the feasibility of scalable value analysis on social media content and discusses implications for education, policy, and potential misuse, while providing a foundation for future, broader investigations into value transmission online.

Abstract

Societal and personal values are transmitted to younger generations through interaction and exposure. Traditionally, children and adolescents learned values from parents, educators, or peers. Nowadays, social platforms serve as a significant channel through which youth (and adults) consume information, as the main medium of entertainment, and possibly the medium through which they learn different values. In this paper we extract implicit values from TikTok movies uploaded by online influencers targeting children and adolescents. We curated a dataset of hundreds of TikTok movies and annotated them according to the Schwartz Theory of Personal Values. We then experimented with an array of Masked and Large language model, exploring how values can be detected. Specifically, we considered two pipelines -- direct extraction of values from video and a 2-step approach in which videos are first converted to elaborated scripts and then values are extracted. Achieving state-of-the-art results, we find that the 2-step approach performs significantly better than the direct approach and that using a trainable Masked Language Model as a second step significantly outperforms a few-shot application of a number of Large Language Models. We further discuss the impact of fine-tuning and compare the performance of the different models on identification of values present or contradicted in the TikTok. Finally, we share the first values-annotated dataset of TikTok videos. Our results pave the way to further research on influence and value transmission in video-based social platforms.

The Value of Nothing: Multimodal Extraction of Human Values Expressed by TikTok Influencers

TL;DR

The paper tackles how Schwartz's Personal Values are expressed and transmitted by TikTok influencers targeting youth. It compares a direct multimodal extraction approach to a 2-step video-to-script pipeline, finding that a trainable MLM used in the second step yields the strongest results, surpassing few-shot LLM performance. A 890-video, manually annotated dataset is introduced, enabling robust evaluation and advancing multimodal value detection in short-form video. The work demonstrates the feasibility of scalable value analysis on social media content and discusses implications for education, policy, and potential misuse, while providing a foundation for future, broader investigations into value transmission online.

Abstract

Societal and personal values are transmitted to younger generations through interaction and exposure. Traditionally, children and adolescents learned values from parents, educators, or peers. Nowadays, social platforms serve as a significant channel through which youth (and adults) consume information, as the main medium of entertainment, and possibly the medium through which they learn different values. In this paper we extract implicit values from TikTok movies uploaded by online influencers targeting children and adolescents. We curated a dataset of hundreds of TikTok movies and annotated them according to the Schwartz Theory of Personal Values. We then experimented with an array of Masked and Large language model, exploring how values can be detected. Specifically, we considered two pipelines -- direct extraction of values from video and a 2-step approach in which videos are first converted to elaborated scripts and then values are extracted. Achieving state-of-the-art results, we find that the 2-step approach performs significantly better than the direct approach and that using a trainable Masked Language Model as a second step significantly outperforms a few-shot application of a number of Large Language Models. We further discuss the impact of fine-tuning and compare the performance of the different models on identification of values present or contradicted in the TikTok. Finally, we share the first values-annotated dataset of TikTok videos. Our results pave the way to further research on influence and value transmission in video-based social platforms.
Paper Structure (25 sections, 7 figures, 3 tables)

This paper contains 25 sections, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Values per movie in the annotated data.
  • Figure 2: Occurrences of each value in the annotated dataset. Blue bars indicate the appearance of a value (e.g., achievement) and red bars indicate the conflicted value (e.g., apathy and lack of ambition, conflicting with achievement).
  • Figure 3: Direct and indirect extraction of values from TikTok movies. Fine tuning the textual models on scripts is optional. LLMs are used in a zero/few-shot manner. The MLM must be trained on a subsample in a supervised manner.
  • Figure 4: F-Scores per each model and value. Values appearing less than 30 times were not included in the BERT based MLM models, thus excluded from the analysis (appearing with score 0 in the figure).
  • Figure 5: Annotation guidelines and prompt for direct extraction.
  • ...and 2 more figures