Table of Contents
Fetching ...

ComplexityNet: Increasing LLM Inference Efficiency by Learning Task Complexity

Henry Bae, Aghyad Deeb, Alex Fleury, Kehang Zhu

TL;DR

This study demonstrates that fine-tuning smaller models to categorize tasks based on their complexity can lead to a more balanced trade-off between accuracy and efficiency in the use of Large Language Models.

Abstract

We present ComplexityNet, a streamlined language model designed for assessing task complexity. This model predicts the likelihood of accurate output by various language models, each with different capabilities. Our initial application of ComplexityNet involves the Mostly Basic Python Problems (MBPP) dataset. We pioneered the creation of the first set of labels to define task complexity. ComplexityNet achieved a notable 79% accuracy in determining task complexity, a significant improvement over the 34% accuracy of the original, non fine-tuned model. Furthermore, ComplexityNet effectively reduces computational resource usage by 90% compared to using the highest complexity model, while maintaining a high code generation accuracy of 86.7%. This study demonstrates that fine-tuning smaller models to categorize tasks based on their complexity can lead to a more balanced trade-off between accuracy and efficiency in the use of Large Language Models. Our findings suggest a promising direction for optimizing LLM applications, especially in resource-constrained environments.

ComplexityNet: Increasing LLM Inference Efficiency by Learning Task Complexity

TL;DR

This study demonstrates that fine-tuning smaller models to categorize tasks based on their complexity can lead to a more balanced trade-off between accuracy and efficiency in the use of Large Language Models.

Abstract

We present ComplexityNet, a streamlined language model designed for assessing task complexity. This model predicts the likelihood of accurate output by various language models, each with different capabilities. Our initial application of ComplexityNet involves the Mostly Basic Python Problems (MBPP) dataset. We pioneered the creation of the first set of labels to define task complexity. ComplexityNet achieved a notable 79% accuracy in determining task complexity, a significant improvement over the 34% accuracy of the original, non fine-tuned model. Furthermore, ComplexityNet effectively reduces computational resource usage by 90% compared to using the highest complexity model, while maintaining a high code generation accuracy of 86.7%. This study demonstrates that fine-tuning smaller models to categorize tasks based on their complexity can lead to a more balanced trade-off between accuracy and efficiency in the use of Large Language Models. Our findings suggest a promising direction for optimizing LLM applications, especially in resource-constrained environments.
Paper Structure (13 sections, 3 equations, 4 figures, 1 table)

This paper contains 13 sections, 3 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Overview of the problem: The prompt is first fed through the complexity model then to one of the three models. We want to train a complexity model that picks the lowest cost-model that successfully accomplishes the task.
  • Figure 2: Overview of our approach. Each row of the dataset is fed through the three language models, and we store the success rate of each models. These success rates are used to generate a single complexity value for each prompts.
  • Figure 3: One example of the ordering mapping based on the success rate of each model at the task.
  • Figure 4: Comparison of the prediction accuracy of the the task complexity levels.