Adaptive Prompting: Ad-hoc Prompt Composition for Social Bias Detection
Maximilian Spliethöver, Tim Knebler, Fabian Fumagalli, Maximilian Muschalik, Barbara Hammer, Eyke Hüllermeier, Henning Wachsmuth
TL;DR
This work tackles the instability and trial-and-error nature of prompting large language models for social bias detection by proposing an adaptive prompting framework that predicts input-specific prompt compositions. It constructs a rich pool of discrete prompting techniques, collects per-composition labels, and trains an encoder to select the best composition for each input, supplemented by a Shapley-value analysis to understand technique interactions. Empirical results across three instruction-tuned LLMs and three bias datasets show that adaptive prompting can outperform fixed compositions and, in many settings, surpass baselines including fine-tuned models, with insights into which techniques most consistently contribute. The study demonstrates generalizability to other tasks and highlights the importance of considering second-order interactions among prompting techniques for reliable bias detection and more efficient LLM use.
Abstract
Recent advances on instruction fine-tuning have led to the development of various prompting techniques for large language models, such as explicit reasoning steps. However, the success of techniques depends on various parameters, such as the task, language model, and context provided. Finding an effective prompt is, therefore, often a trial-and-error process. Most existing approaches to automatic prompting aim to optimize individual techniques instead of compositions of techniques and their dependence on the input. To fill this gap, we propose an adaptive prompting approach that predicts the optimal prompt composition ad-hoc for a given input. We apply our approach to social bias detection, a highly context-dependent task that requires semantic understanding. We evaluate it with three large language models on three datasets, comparing compositions to individual techniques and other baselines. The results underline the importance of finding an effective prompt composition. Our approach robustly ensures high detection performance, and is best in several settings. Moreover, first experiments on other tasks support its generalizability.
