Automatic Detection of Research Values from Scientific Abstracts Across Computer Science Subfields
Hang Jiang, Tal August, Luca Soldaini, Kyle Lo, Maria Antoniak
TL;DR
The paper addresses the problem of automatically detecting research values expressed in CS abstracts by introducing a ten-value annotation scheme and building large-scale annotated data from 226,600 abstracts across 32 subfields over a decade. It compares lexicon-based value classifiers to LLM prompting, finding that lexicon-based methods generally perform better for multiple values, and uses these classifiers to characterize subfield differences and temporal trends in value emphasis. Key findings include distinct value profiles per subfield (e.g., AI vs traditional CS domains), increasing emphasis on several values over time in AI-related areas, and notable co-occurrence patterns among values. The work provides a scalable methodology, a public dataset, and insights that can inform scholarly writing, evaluation practices, and cross-disciplinary meta-science in CS.
Abstract
The field of Computer science (CS) has rapidly evolved over the past few decades, providing computational tools and methodologies to various fields and forming new interdisciplinary communities. This growth in CS has significantly impacted institutional practices and relevant research communities. Therefore, it is crucial to explore what specific research values, known as basic and fundamental beliefs that guide or motivate research attitudes or actions, CS-related research communities promote. Prior research has manually analyzed research values from a small sample of machine learning papers. No prior work has studied the automatic detection of research values in CS from large-scale scientific texts across different research subfields. This paper introduces a detailed annotation scheme featuring ten research values that guide CS-related research. Based on the scheme, we build value classifiers to scale up the analysis and present a systematic study over 226,600 paper abstracts from 32 CS-related subfields and 86 popular publishing venues over ten years.
