Table of Contents
Fetching ...

Prompt2Model: Generating Deployable Models from Natural Language Instructions

Vijay Viswanathan, Chenyang Zhao, Amanda Bertsch, Tongshuang Wu, Graham Neubig

TL;DR

Prompt2Model addresses the gap between prompt-based rapid prototyping with LLMs and practical deployment by automatically constructing small, task-specific models from natural-language prompts. It combines dataset retrieval, LLM-driven dataset generation, and model retrieval to create a training set and select a suitable pretrained student model, followed by finetuning and evaluation. The approach demonstrates that, for several tasks, the resulting compact models can outperform the same-prompt GPT-3.5-turbo baseline while being orders of magnitude smaller, and that synthetic evaluation data can reliably estimate real-world performance. The framework is modular and open-source, offering a platform for exploring data-centric and distillation techniques in an end-to-end, prompt-governed pipeline, with potential for broader accessibility and reproducibility in NLP deployment.

Abstract

Large language models (LLMs) enable system builders today to create competent NLP systems through prompting, where they only need to describe the task in natural language and provide a few examples. However, in other ways, LLMs are a step backward from traditional special-purpose NLP models; they require extensive computational resources for deployment and can be gated behind APIs. In this paper, we propose Prompt2Model, a general-purpose method that takes a natural language task description like the prompts provided to LLMs, and uses it to train a special-purpose model that is conducive to deployment. This is done through a multi-step process of retrieval of existing datasets and pretrained models, dataset generation using LLMs, and supervised fine-tuning on these retrieved and generated datasets. Over three tasks, we demonstrate that given the same few-shot prompt as input, Prompt2Model trains models that outperform the results of a strong LLM, gpt-3.5-turbo, by an average of 20% while being up to 700 times smaller. We also show that this data can be used to obtain reliable performance estimates of model performance, enabling model developers to assess model reliability before deployment. Prompt2Model is available open-source at https://github.com/neulab/prompt2model.

Prompt2Model: Generating Deployable Models from Natural Language Instructions

TL;DR

Prompt2Model addresses the gap between prompt-based rapid prototyping with LLMs and practical deployment by automatically constructing small, task-specific models from natural-language prompts. It combines dataset retrieval, LLM-driven dataset generation, and model retrieval to create a training set and select a suitable pretrained student model, followed by finetuning and evaluation. The approach demonstrates that, for several tasks, the resulting compact models can outperform the same-prompt GPT-3.5-turbo baseline while being orders of magnitude smaller, and that synthetic evaluation data can reliably estimate real-world performance. The framework is modular and open-source, offering a platform for exploring data-centric and distillation techniques in an end-to-end, prompt-governed pipeline, with potential for broader accessibility and reproducibility in NLP deployment.

Abstract

Large language models (LLMs) enable system builders today to create competent NLP systems through prompting, where they only need to describe the task in natural language and provide a few examples. However, in other ways, LLMs are a step backward from traditional special-purpose NLP models; they require extensive computational resources for deployment and can be gated behind APIs. In this paper, we propose Prompt2Model, a general-purpose method that takes a natural language task description like the prompts provided to LLMs, and uses it to train a special-purpose model that is conducive to deployment. This is done through a multi-step process of retrieval of existing datasets and pretrained models, dataset generation using LLMs, and supervised fine-tuning on these retrieved and generated datasets. Over three tasks, we demonstrate that given the same few-shot prompt as input, Prompt2Model trains models that outperform the results of a strong LLM, gpt-3.5-turbo, by an average of 20% while being up to 700 times smaller. We also show that this data can be used to obtain reliable performance estimates of model performance, enabling model developers to assess model reliability before deployment. Prompt2Model is available open-source at https://github.com/neulab/prompt2model.
Paper Structure (31 sections, 1 equation, 3 figures, 3 tables)

This paper contains 31 sections, 1 equation, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Prompt2Model is a framework for generating a small yet accurate model from a prompt.
  • Figure 2: The Prompt2Model architecture seeks to automate the core machine learning development pipeline, allowing us to train a small yet accurate model from just a prompt.
  • Figure 3: For our model retriever, we first construct a hypothetical model description for a query, then compute similarity scores between that hypothetical model description and the descriptions of real models.