Table of Contents
Fetching ...

Octopus v4: Graph of language models

Wei Chen, Zhiyuan Li

TL;DR

The paper addresses the inefficiency and cost of relying on a single giant proprietary model by proposing a graph-based framework that coordinates multiple open-source vertical models via functional tokens. Octopus v4 acts as the coordinator, routing queries to task-specialized workers and reformulating inputs to maximize performance while keeping models under 10B parameters. The key contributions include a detailed methodology for graph-based model coordination, two abstraction layers linking v2 and v4 Octopus, and empirical results on the MMLU benchmark showing competitive performance with far smaller models. The work demonstrates practical, energy-efficient multi-model inference and provides open-source resources to reproduce and extend the framework.

Abstract

Language models have been effective in a wide range of applications, yet the most sophisticated models are often proprietary. For example, GPT-4 by OpenAI and various models by Anthropic are expensive and consume substantial energy. In contrast, the open-source community has produced competitive models, like Llama3. Furthermore, niche-specific smaller language models, such as those tailored for legal, medical or financial tasks, have outperformed their proprietary counterparts. This paper introduces a novel approach that employs \textit{functional tokens} to integrate \textbf{multiple open-source models}, each optimized for particular tasks. Our newly developed Octopus v4 model leverages \textit{functional tokens} to intelligently direct user queries to the most appropriate vertical model and reformat the query to achieve the best performance. Octopus v4, an evolution of the Octopus v1, v2, and v3 models, excels in selection and parameter understanding and reformatting. Additionally, we explore the use of graph as a versatile data structure that effectively coordinates multiple open-source models by harnessing the capabilities of the Octopus model and \textit{functional tokens}. Use our open-sourced GitHub (\url{https://www.nexa4ai.com/}) to try Octopus v4 models (\url{https://huggingface.co/NexaAIDev/Octopus-v4}), and contrite to a larger graph of language models. By activating models less than 10B parameters, we achieved SOTA MMLU score of 74.8 among the same level models.

Octopus v4: Graph of language models

TL;DR

The paper addresses the inefficiency and cost of relying on a single giant proprietary model by proposing a graph-based framework that coordinates multiple open-source vertical models via functional tokens. Octopus v4 acts as the coordinator, routing queries to task-specialized workers and reformulating inputs to maximize performance while keeping models under 10B parameters. The key contributions include a detailed methodology for graph-based model coordination, two abstraction layers linking v2 and v4 Octopus, and empirical results on the MMLU benchmark showing competitive performance with far smaller models. The work demonstrates practical, energy-efficient multi-model inference and provides open-source resources to reproduce and extend the framework.

Abstract

Language models have been effective in a wide range of applications, yet the most sophisticated models are often proprietary. For example, GPT-4 by OpenAI and various models by Anthropic are expensive and consume substantial energy. In contrast, the open-source community has produced competitive models, like Llama3. Furthermore, niche-specific smaller language models, such as those tailored for legal, medical or financial tasks, have outperformed their proprietary counterparts. This paper introduces a novel approach that employs \textit{functional tokens} to integrate \textbf{multiple open-source models}, each optimized for particular tasks. Our newly developed Octopus v4 model leverages \textit{functional tokens} to intelligently direct user queries to the most appropriate vertical model and reformat the query to achieve the best performance. Octopus v4, an evolution of the Octopus v1, v2, and v3 models, excels in selection and parameter understanding and reformatting. Additionally, we explore the use of graph as a versatile data structure that effectively coordinates multiple open-source models by harnessing the capabilities of the Octopus model and \textit{functional tokens}. Use our open-sourced GitHub (\url{https://www.nexa4ai.com/}) to try Octopus v4 models (\url{https://huggingface.co/NexaAIDev/Octopus-v4}), and contrite to a larger graph of language models. By activating models less than 10B parameters, we achieved SOTA MMLU score of 74.8 among the same level models.
Paper Structure (14 sections, 5 equations, 5 figures)

This paper contains 14 sections, 5 equations, 5 figures.

Figures (5)

  • Figure 1: The shift from single model inference, employing a trillion-parameter model, to multi-node collaboration coordinated by Octopus model. This framework optimizes the inference process by selecting the most suitable specialized models based on the user's query, activating only two models that each has fewer than 10B parameters for one-step inference. We only show a small graph here, but the framework can support a large graph. See the demonstration of the graph (https://graph.nexa4ai.com/) here.
  • Figure 2: The Octopus model is utilized to determine the optimal neighboring node and generate appropriate information for transmission. Consider a scenario where the Octopus model's neighbors are MathGPT llama2mathgpt2024, LawGPT cheng2024adapting, HealthCareGPT abridge2024, CodeGPT codegpt2024, and RoomGPT roomsgpt2024. The Octopus model can identify the most relevant GPT and transform the initial query into a format best suited for the selected GPT.
  • Figure 3: In our design, the architecture consists of two abstraction layers. The first layer utilizes functional tokens to represent the actions executable by the Octopus v2 model. This layer encompasses three distinct Octopus v2 models, each identified by different functional tokens, effectively differentiating them as separate AI agents. The second layer of abstraction pertains to the Octopus v4 model, where internal functional tokens are mapped to various v2 models. For simplicity, we only include three v2 models, but one can map to multiple v2 models in real use cases.
  • Figure 4: Our system design features a graph of language models with a master node deployed on a central device and worker nodes distributed across various devices. We employ Kubernetes (k8s) for serverless deployment of each individual worker language model. For efficient data sharing, we utilize a distributed cache mechanism supported by Redis. Note that for each worker node, we have a small Octopus v4 Lora attached to it to guide the next neighbor node for the case of multi-Agent use cases.
  • Figure 5: The comparison of MMLU scores between Octopus v4 and other models. During Octopus v4's inference, only two small language models, each with fewer than 10B parameters, are activated. Octopus v4 achieves significant improvement in MMLU scores, requiring only a small sacrifice of tokens due to the utilization of functional tokens.