Systematic Framework of Application Methods for Large Language Models in Language Sciences
Kun Sun, Rong Wang
TL;DR
The paper tackles methodological fragmentation in applying large language models to language sciences by proposing two integrated frameworks: a method-selection framework that maps research goals to LLM paradigms, and a constructed configurations framework that structures multi-stage, reproducible research pipelines. It delineates three complementary approaches—prompt-based exploration, fine-tuning of open-source models, and embedding-based analysis—and provides detailed implementation guidance, including agent-role workflows, data/documentation practices, and evaluation standards. The authors validate the frameworks through retrospective analyses, prospective replications, and expert surveys, showing improvements in transparency, reproducibility, and methodological accountability. Collectively, the work offers a theory-grounded blueprint to move LLM-based language science from ad-hoc utility toward verifiable, robust, and cumulative scientific practice.
Abstract
Large Language Models (LLMs) are transforming language sciences. However, their widespread deployment currently suffers from methodological fragmentation and a lack of systematic soundness. This study proposes two comprehensive methodological frameworks designed to guide the strategic and responsible application of LLMs in language sciences. The first method-selection framework defines and systematizes three distinct, complementary approaches, each linked to a specific research goal: (1) prompt-based interaction with general-use models for exploratory analysis and hypothesis generation; (2) fine-tuning of open-source models for confirmatory, theory-driven investigation and high-quality data generation; and (3) extraction of contextualized embeddings for further quantitative analysis and probing of model internal mechanisms. We detail the technical implementation and inherent trade-offs of each method, supported by empirical case studies. Based on the method-selection framework, the second systematic framework proposed provides constructed configurations that guide the practical implementation of multi-stage research pipelines based on these approaches. We then conducted a series of empirical experiments to validate our proposed framework, employing retrospective analysis, prospective application, and an expert evaluation survey. By enforcing the strategic alignment of research questions with the appropriate LLM methodology, the frameworks enable a critical paradigm shift in language science research. We believe that this system is fundamental for ensuring reproducibility, facilitating the critical evaluation of LLM mechanisms, and providing the structure necessary to move traditional linguistics from ad-hoc utility to verifiable, robust science.
