Table of Contents
Fetching ...

Instruction Tuning Vs. In-Context Learning: Revisiting Large Language Models in Few-Shot Computational Social Science

Taihang Wang, Xiaoman Xu, Yimin Wang, Ye Jiang

TL;DR

The paper investigates how to adapt large language models to computational social science tasks in few-shot settings by directly comparing instruction tuning (IT) and in-context learning (ICL). It implements IT with LoRA fine-tuning and evaluates ICL prompts across six open-source LLMs on five CSS datasets, also comparing zero-shot and CoT prompting strategies. The results show ICL generally outperforms IT and that prompt design and sample quality strongly influence performance, while merely increasing sample counts does not guarantee gains. The work provides practical guidance for deploying LLMs in CSS, highlighting the importance of diverse, high-quality few-shot samples and careful prompting choices, with code to be released. Overall, the study clarifies when and how to use ICL over IT in few-shot CSS tasks and outlines caveats related to CoT and data quality.

Abstract

Real-world applications of large language models (LLMs) in computational social science (CSS) tasks primarily depend on the effectiveness of instruction tuning (IT) or in-context learning (ICL). While IT has shown highly effective at fine-tuning LLMs for various tasks, ICL offers a rapid alternative for task adaptation by learning from examples without explicit gradient updates. In this paper, we evaluate the classification performance of LLMs using IT versus ICL in few-shot CSS tasks. The experimental results indicate that ICL consistently outperforms IT in most CSS tasks. Additionally, we investigate the relationship between the increasing number of training samples and LLM performance. Our findings show that simply increasing the number of samples without considering their quality does not consistently enhance the performance of LLMs with either ICL or IT and can sometimes even result in a performance decline. Finally, we compare three prompting strategies, demonstrating that ICL is more effective than zero-shot and Chain-of-Thought (CoT). Our research highlights the significant advantages of ICL in handling CSS tasks in few-shot settings and emphasizes the importance of optimizing sample quality and prompting strategies to improve LLM classification performance. The code will be made available.

Instruction Tuning Vs. In-Context Learning: Revisiting Large Language Models in Few-Shot Computational Social Science

TL;DR

The paper investigates how to adapt large language models to computational social science tasks in few-shot settings by directly comparing instruction tuning (IT) and in-context learning (ICL). It implements IT with LoRA fine-tuning and evaluates ICL prompts across six open-source LLMs on five CSS datasets, also comparing zero-shot and CoT prompting strategies. The results show ICL generally outperforms IT and that prompt design and sample quality strongly influence performance, while merely increasing sample counts does not guarantee gains. The work provides practical guidance for deploying LLMs in CSS, highlighting the importance of diverse, high-quality few-shot samples and careful prompting choices, with code to be released. Overall, the study clarifies when and how to use ICL over IT in few-shot CSS tasks and outlines caveats related to CoT and data quality.

Abstract

Real-world applications of large language models (LLMs) in computational social science (CSS) tasks primarily depend on the effectiveness of instruction tuning (IT) or in-context learning (ICL). While IT has shown highly effective at fine-tuning LLMs for various tasks, ICL offers a rapid alternative for task adaptation by learning from examples without explicit gradient updates. In this paper, we evaluate the classification performance of LLMs using IT versus ICL in few-shot CSS tasks. The experimental results indicate that ICL consistently outperforms IT in most CSS tasks. Additionally, we investigate the relationship between the increasing number of training samples and LLM performance. Our findings show that simply increasing the number of samples without considering their quality does not consistently enhance the performance of LLMs with either ICL or IT and can sometimes even result in a performance decline. Finally, we compare three prompting strategies, demonstrating that ICL is more effective than zero-shot and Chain-of-Thought (CoT). Our research highlights the significant advantages of ICL in handling CSS tasks in few-shot settings and emphasizes the importance of optimizing sample quality and prompting strategies to improve LLM classification performance. The code will be made available.
Paper Structure (22 sections, 3 figures, 14 tables)

This paper contains 22 sections, 3 figures, 14 tables.

Figures (3)

  • Figure 1: Illustration of the overall workflow in this paper. (a) The instruction prompts including the context of the tasks (Instruction), the constraints for generating the responses from LLMs (Constraints), and the input text of each task (Text). (b) The ICL prompts include a set of input-label pairs (Samples) to guide the LLMs in focusing on task-specific content. (c) A comparison between different prompting strategies in CSS tasks.
  • Figure 2: Illustration of different sample sizes affect the performance of LLMs with ICL and IT respectively.
  • Figure A1: Performance comparison between LLMs on CSS tasks.