Table of Contents
Fetching ...

ModelGPT: Unleashing LLM's Capabilities for Tailored Model Generation

Zihao Tang, Zheqi Lv, Shengyu Zhang, Fei Wu, Kun Kuang

TL;DR

ModelGPT presents a two-module framework that uses large language models to interpret user data or descriptions and generate tailored small-scale models via a Requirement Generator and a Model Customizer. By encoding requirements into a latent representation and synthesizing architecture and parameters (with LoRA adapters) in a single forward pass, it achieves significant speedups (up to 270x) over traditional pretrain-finetune pipelines while delivering competitive performance across NLP, CV, and tabular tasks. The approach demonstrates inter-task knowledge transfer, zero-shot capabilities, and improved weight initialization that accelerates subsequent fine-tuning. While promising, the work remains early-stage, with future work aimed at refining architecture generation granularity and improving parameter generation efficiency for broader model families.

Abstract

The rapid advancement of Large Language Models (LLMs) has revolutionized various sectors by automating routine tasks, marking a step toward the realization of Artificial General Intelligence (AGI). However, they still struggle to accommodate the diverse and specific needs of users and simplify the utilization of AI models for the average user. In response, we propose ModelGPT, a novel framework designed to determine and generate AI models specifically tailored to the data or task descriptions provided by the user, leveraging the capabilities of LLMs. Given user requirements, ModelGPT is able to provide tailored models at most 270x faster than the previous paradigms (e.g. all-parameter or LoRA finetuning). Comprehensive experiments on NLP, CV, and Tabular datasets attest to the effectiveness of our framework in making AI models more accessible and user-friendly. Our code is available at https://github.com/IshiKura-a/ModelGPT.

ModelGPT: Unleashing LLM's Capabilities for Tailored Model Generation

TL;DR

ModelGPT presents a two-module framework that uses large language models to interpret user data or descriptions and generate tailored small-scale models via a Requirement Generator and a Model Customizer. By encoding requirements into a latent representation and synthesizing architecture and parameters (with LoRA adapters) in a single forward pass, it achieves significant speedups (up to 270x) over traditional pretrain-finetune pipelines while delivering competitive performance across NLP, CV, and tabular tasks. The approach demonstrates inter-task knowledge transfer, zero-shot capabilities, and improved weight initialization that accelerates subsequent fine-tuning. While promising, the work remains early-stage, with future work aimed at refining architecture generation granularity and improving parameter generation efficiency for broader model families.

Abstract

The rapid advancement of Large Language Models (LLMs) has revolutionized various sectors by automating routine tasks, marking a step toward the realization of Artificial General Intelligence (AGI). However, they still struggle to accommodate the diverse and specific needs of users and simplify the utilization of AI models for the average user. In response, we propose ModelGPT, a novel framework designed to determine and generate AI models specifically tailored to the data or task descriptions provided by the user, leveraging the capabilities of LLMs. Given user requirements, ModelGPT is able to provide tailored models at most 270x faster than the previous paradigms (e.g. all-parameter or LoRA finetuning). Comprehensive experiments on NLP, CV, and Tabular datasets attest to the effectiveness of our framework in making AI models more accessible and user-friendly. Our code is available at https://github.com/IshiKura-a/ModelGPT.
Paper Structure (19 sections, 5 equations, 6 figures, 5 tables, 1 algorithm)

This paper contains 19 sections, 5 equations, 6 figures, 5 tables, 1 algorithm.

Figures (6)

  • Figure 1: Overview of the framework of ModelGPT.
  • Figure 2: Details of the workflow of ModelGPT. Here, we also provide real examples taken from our main experiments.
  • Figure 3: Case study on the prompt design. Here, we choose the prompt we use for CV experiments as an example. The first row is the basis of our prompt, leaving the last 2 lines filled with real data and task descriptions. We then provide two pairs of examples in the next rows. In the pair, the only difference between them is whether or not we provide task descriptions. The third column of these examples is the result LLM (GPT-4-vision-preview in this case) outputs. The green color texts are those reflecting the correct data-specific information, while the red ones are those reflecting the WRONG information and the gray ones are irrelevant information.
  • Figure 4: Detailed analyses on the capability of weight initialization of ModelGPT. For clearer comparison, we increase the length of the starting epochs. Meanwhile, we mark the best checkpoint of each method in the figures with a solid round point.
  • Figure 5: Case study on the prompt design. Here, we provide the prompt template we use in our main experiments on NLP, CV, and tabular data. Each example has 2-3 rows. In each example, the first row is the basis of our prompt, leaving the last 2 lines filled with real data and task descriptions. In CV, we then provide two pairs of examples in the next rows. In the pair, the only difference between them is whether or not we provide task descriptions. The third column of these examples is the result LLM (GPT-4-vision-preview in this case) outputs. The green color texts are those reflecting the correct data-specific information, while the red ones are those reflecting the WRONG information and the gray ones are irrelevant information.
  • ...and 1 more figures