Table of Contents
Fetching ...

Improving Large Models with Small models: Lower Costs and Better Performance

Dong Chen, Shuo Zhang, Yueting Zhuang, Siliang Tang, Qidong Liu, Hua Wang, Mingliang Xu

TL;DR

Data Shunt (DS) is proposed, a general paradigm for collaboration of small and large models that not only substantially reduces the cost associated with querying large models but also effectively improves large models' performance.

Abstract

Pretrained large models (PLMs), such as ChatGPT, have demonstrated remarkable performance across diverse tasks. However, the significant computational requirements of PLMs have discouraged most product teams from running or fine-tuning them. In such cases, to harness the exceptional performance of PLMs, one must rely on expensive APIs, thereby exacerbating the economic burden. Despite the overall inferior performance of small models, in specific distributions, they can achieve comparable or even superior results. Consequently, some input can be processed exclusively by small models. On the other hand, certain tasks can be broken down into multiple subtasks, some of which can be completed without powerful capabilities. Under these circumstances, small models can handle the simple subtasks, allowing large models to focus on challenging subtasks, thus improving the performance. We propose Data Shunt$^+$ (DS$^+$), a general paradigm for collaboration of small and large models. DS$^+$ not only substantially reduces the cost associated with querying large models but also effectively improves large models' performance. For instance, ChatGPT achieves an accuracy of $94.43\%$ on Amazon Product sentiment analysis, and DS$^+$ achieves an accuracy of $95.64\%$, while the cost has been reduced to only $31.18\%$. Besides, experiments also prove that the proposed collaborative-based paradigm can better inject specific task knowledge into PLMs compared to fine-tuning.

Improving Large Models with Small models: Lower Costs and Better Performance

TL;DR

Data Shunt (DS) is proposed, a general paradigm for collaboration of small and large models that not only substantially reduces the cost associated with querying large models but also effectively improves large models' performance.

Abstract

Pretrained large models (PLMs), such as ChatGPT, have demonstrated remarkable performance across diverse tasks. However, the significant computational requirements of PLMs have discouraged most product teams from running or fine-tuning them. In such cases, to harness the exceptional performance of PLMs, one must rely on expensive APIs, thereby exacerbating the economic burden. Despite the overall inferior performance of small models, in specific distributions, they can achieve comparable or even superior results. Consequently, some input can be processed exclusively by small models. On the other hand, certain tasks can be broken down into multiple subtasks, some of which can be completed without powerful capabilities. Under these circumstances, small models can handle the simple subtasks, allowing large models to focus on challenging subtasks, thus improving the performance. We propose Data Shunt (DS), a general paradigm for collaboration of small and large models. DS not only substantially reduces the cost associated with querying large models but also effectively improves large models' performance. For instance, ChatGPT achieves an accuracy of on Amazon Product sentiment analysis, and DS achieves an accuracy of , while the cost has been reduced to only . Besides, experiments also prove that the proposed collaborative-based paradigm can better inject specific task knowledge into PLMs compared to fine-tuning.
Paper Structure (18 sections, 9 equations, 8 figures, 6 tables, 2 algorithms)

This paper contains 18 sections, 9 equations, 8 figures, 6 tables, 2 algorithms.

Figures (8)

  • Figure 1: Commercial Applications of Large Models. upper: Product team 1 only use large models to support their applications. lower: Product team 2 reduces costs by collaborating with both large and small models, thereby allowing them to provide more appealing prices to users.
  • Figure 2: Small Model for Large Model (S4L). There are two methods in S4L, including Prompt Pruning (PP) and Prompt Transferring (PT). PP refines the prediction space of large models, while PT refines the input space of large models.
  • Figure 3: Large Model for Small Model (L4S). By injecting the knowledge of large models into small models, more samples can be transformed into easy samples, thereby further reducing the frequency of querying large models.
  • Figure 4: The training process of the proposed method. Hard samples refer to data that poses challenges for small models, while easy samples represent data that small models can fit well.
  • Figure 5: Results of paid large models with or without PT.
  • ...and 3 more figures