The Value of Nothing: Multimodal Extraction of Human Values Expressed by TikTok Influencers
Alina Starovolsky-Shitrit, Alon Neduva, Naama Appel Doron, Ella Daniel, Oren Tsur
TL;DR
The paper tackles how Schwartz's Personal Values are expressed and transmitted by TikTok influencers targeting youth. It compares a direct multimodal extraction approach to a 2-step video-to-script pipeline, finding that a trainable MLM used in the second step yields the strongest results, surpassing few-shot LLM performance. A 890-video, manually annotated dataset is introduced, enabling robust evaluation and advancing multimodal value detection in short-form video. The work demonstrates the feasibility of scalable value analysis on social media content and discusses implications for education, policy, and potential misuse, while providing a foundation for future, broader investigations into value transmission online.
Abstract
Societal and personal values are transmitted to younger generations through interaction and exposure. Traditionally, children and adolescents learned values from parents, educators, or peers. Nowadays, social platforms serve as a significant channel through which youth (and adults) consume information, as the main medium of entertainment, and possibly the medium through which they learn different values. In this paper we extract implicit values from TikTok movies uploaded by online influencers targeting children and adolescents. We curated a dataset of hundreds of TikTok movies and annotated them according to the Schwartz Theory of Personal Values. We then experimented with an array of Masked and Large language model, exploring how values can be detected. Specifically, we considered two pipelines -- direct extraction of values from video and a 2-step approach in which videos are first converted to elaborated scripts and then values are extracted. Achieving state-of-the-art results, we find that the 2-step approach performs significantly better than the direct approach and that using a trainable Masked Language Model as a second step significantly outperforms a few-shot application of a number of Large Language Models. We further discuss the impact of fine-tuning and compare the performance of the different models on identification of values present or contradicted in the TikTok. Finally, we share the first values-annotated dataset of TikTok videos. Our results pave the way to further research on influence and value transmission in video-based social platforms.
