Prompt2Model: Generating Deployable Models from Natural Language Instructions
Vijay Viswanathan, Chenyang Zhao, Amanda Bertsch, Tongshuang Wu, Graham Neubig
TL;DR
Prompt2Model addresses the gap between prompt-based rapid prototyping with LLMs and practical deployment by automatically constructing small, task-specific models from natural-language prompts. It combines dataset retrieval, LLM-driven dataset generation, and model retrieval to create a training set and select a suitable pretrained student model, followed by finetuning and evaluation. The approach demonstrates that, for several tasks, the resulting compact models can outperform the same-prompt GPT-3.5-turbo baseline while being orders of magnitude smaller, and that synthetic evaluation data can reliably estimate real-world performance. The framework is modular and open-source, offering a platform for exploring data-centric and distillation techniques in an end-to-end, prompt-governed pipeline, with potential for broader accessibility and reproducibility in NLP deployment.
Abstract
Large language models (LLMs) enable system builders today to create competent NLP systems through prompting, where they only need to describe the task in natural language and provide a few examples. However, in other ways, LLMs are a step backward from traditional special-purpose NLP models; they require extensive computational resources for deployment and can be gated behind APIs. In this paper, we propose Prompt2Model, a general-purpose method that takes a natural language task description like the prompts provided to LLMs, and uses it to train a special-purpose model that is conducive to deployment. This is done through a multi-step process of retrieval of existing datasets and pretrained models, dataset generation using LLMs, and supervised fine-tuning on these retrieved and generated datasets. Over three tasks, we demonstrate that given the same few-shot prompt as input, Prompt2Model trains models that outperform the results of a strong LLM, gpt-3.5-turbo, by an average of 20% while being up to 700 times smaller. We also show that this data can be used to obtain reliable performance estimates of model performance, enabling model developers to assess model reliability before deployment. Prompt2Model is available open-source at https://github.com/neulab/prompt2model.
