S2LPP: Small-to-Large Prompt Prediction across LLMs
Liang Cheng, Tianyi LI, Zhaowei Wang, Mark Steedman
TL;DR
This work addresses the high cost and sensitivity of prompt engineering for large language models by showing that optimal prompts exhibit consistency across model sizes within the same family and to some extent across families. It introduces Small-to-Large Prompt Prediction (S2LPP), a three-step framework that uses small LLMs to generate and select high-performing prompts for a larger target model, substantially reducing computation while achieving near-oracle performance on open-domain QA and natural language inference across 14 LLMs. The method is validated on QA and NLI, with extensions to retrieval-augmented generation and arithmetic reasoning, illustrating robustness and generalizability to broader NLP tasks. The results demonstrate that leveraging prompt-consistency can dramatically cut prompt-engineering costs while maintaining high performance, offering practical benefits for deploying diverse LLMs in real-world settings.
Abstract
The performance of pre-trained Large Language Models (LLMs) is often sensitive to nuances in prompt templates, requiring careful prompt engineering, adding costs in terms of computing and human effort. In this study, we present experiments encompassing multiple LLMs variants of varying sizes aimed at probing their preference with different prompts. Through experiments on Question Answering, we show prompt preference consistency across LLMs of different sizes. We also show that this consistency extends to other tasks, such as Natural Language Inference. Utilizing this consistency, we propose a method to use a smaller model to select effective prompt templates for a larger model. We show that our method substantially reduces the cost of prompt engineering while consistently matching performance with optimal prompts among candidates. More importantly, our experiment shows the efficacy of our strategy across fourteen LLMs and its applicability to a broad range of NLP tasks, highlighting its robustness
