Table of Contents
Fetching ...

Can LLMs Understand Computer Networks? Towards a Virtual System Administrator

Denis Donadel, Francesco Marchiori, Luca Pajola, Mauro Conti

TL;DR

This study addresses whether Large Language Models can understand and work with computer networks. It proposes a NetJSON-based evaluation framework and tests six LLMs (three proprietary, three open-source) across three network topologies to answer four key questions about topology reasoning, graphical representations, IP/subnet recognition, and path connectivity. Findings show that large proprietary models achieve substantial accuracy on simple to medium topologies (with a best average around 79.3%), but performance declines with network complexity, while open-source models struggle more consistently. The work highlights the role of prompt engineering and cautious deployment (privacy considerations) for practical use, and provides actionable guidance for deploying LLMs as virtual sysadmins with clear limitations and directions for future research.

Abstract

Recent advancements in Artificial Intelligence, and particularly Large Language Models (LLMs), offer promising prospects for aiding system administrators in managing the complexity of modern networks. However, despite this potential, a significant gap exists in the literature regarding the extent to which LLMs can understand computer networks. Without empirical evidence, system administrators might rely on these models without assurance of their efficacy in performing network-related tasks accurately. In this paper, we are the first to conduct an exhaustive study on LLMs' comprehension of computer networks. We formulate several research questions to determine whether LLMs can provide correct answers when supplied with a network topology and questions on it. To assess them, we developed a thorough framework for evaluating LLMs' capabilities in various network-related tasks. We evaluate our framework on multiple computer networks employing proprietary (e.g., GPT4) and open-source (e.g., Llama2) models. Our findings in general purpose LLMs using a zero-shot scenario demonstrate promising results, with the best model achieving an average accuracy of 79.3%. Proprietary LLMs achieve noteworthy results in small and medium networks, while challenges persist in comprehending complex network topologies, particularly for open-source models. Moreover, we provide insight into how prompt engineering can enhance the accuracy of some tasks.

Can LLMs Understand Computer Networks? Towards a Virtual System Administrator

TL;DR

This study addresses whether Large Language Models can understand and work with computer networks. It proposes a NetJSON-based evaluation framework and tests six LLMs (three proprietary, three open-source) across three network topologies to answer four key questions about topology reasoning, graphical representations, IP/subnet recognition, and path connectivity. Findings show that large proprietary models achieve substantial accuracy on simple to medium topologies (with a best average around 79.3%), but performance declines with network complexity, while open-source models struggle more consistently. The work highlights the role of prompt engineering and cautious deployment (privacy considerations) for practical use, and provides actionable guidance for deploying LLMs as virtual sysadmins with clear limitations and directions for future research.

Abstract

Recent advancements in Artificial Intelligence, and particularly Large Language Models (LLMs), offer promising prospects for aiding system administrators in managing the complexity of modern networks. However, despite this potential, a significant gap exists in the literature regarding the extent to which LLMs can understand computer networks. Without empirical evidence, system administrators might rely on these models without assurance of their efficacy in performing network-related tasks accurately. In this paper, we are the first to conduct an exhaustive study on LLMs' comprehension of computer networks. We formulate several research questions to determine whether LLMs can provide correct answers when supplied with a network topology and questions on it. To assess them, we developed a thorough framework for evaluating LLMs' capabilities in various network-related tasks. We evaluate our framework on multiple computer networks employing proprietary (e.g., GPT4) and open-source (e.g., Llama2) models. Our findings in general purpose LLMs using a zero-shot scenario demonstrate promising results, with the best model achieving an average accuracy of 79.3%. Proprietary LLMs achieve noteworthy results in small and medium networks, while challenges persist in comprehending complex network topologies, particularly for open-source models. Moreover, we provide insight into how prompt engineering can enhance the accuracy of some tasks.
Paper Structure (15 sections, 5 figures, 7 tables)

This paper contains 15 sections, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Our evaluation framework. We test several llm with a combination of tasks specific to each network which is represented in a NetJSON format.
  • Figure 2: Basic prompt for network comprehension task.
  • Figure 3: Average accuracy on answering questions on each network.
  • Figure 4: DALL-E generated image of a network. As shown, it is an artistic interpretation and does not include any detail of the original network.
  • Figure 5: An example of a complete prompt related to task T2 and network Routers, both with base prompt and with step by step reasoning.