Table of Contents
Fetching ...

Glia: A Human-Inspired AI for Automated Systems Design and Optimization

Pouya Hamadanian, Pantea Karimi, Arash Nasr-Esfahany, Kimia Noorbakhsh, Joseph Chandler, Ali ParandehGheibi, Mohammad Alizadeh, Hari Balakrishnan

TL;DR

Glia tackles the problem of designing and optimizing complex networked systems with AI by leveraging a human-inspired, multi-agent workflow of LLMs. It combines a front-end, reasoning agents, and an evaluation framework to ground abstract reasoning in empirical data, producing interpretable designs instead of opaque policies. In a case study on LLM-serving in a distributed GPU cluster, Glia derives novel routing, scheduling, and autoscaling algorithms that outperform baselines and transfer to real systems, while adapting to changing workloads. The work demonstrates that structured, reasoning-driven exploration with continuous evaluation can yield creative, robust, and interpretable solutions for challenging systems problems.

Abstract

Can an AI autonomously design mechanisms for computer systems on par with the creativity and reasoning of human experts? We present Glia, an AI architecture for networked systems design that uses large language models (LLMs) in a human-inspired, multi-agent workflow. Each agent specializes in reasoning, experimentation, and analysis, collaborating through an evaluation framework that grounds abstract reasoning in empirical feedback. Unlike prior ML-for-systems methods that optimize black-box policies, Glia generates interpretable designs and exposes its reasoning process. When applied to a distributed GPU cluster for LLM inference, it produces new algorithms for request routing, scheduling, and auto-scaling that perform at human-expert levels in significantly less time, while yielding novel insights into workload behavior. Our results suggest that by combining reasoning LLMs with structured experimentation, an AI can produce creative and understandable designs for complex systems problems.

Glia: A Human-Inspired AI for Automated Systems Design and Optimization

TL;DR

Glia tackles the problem of designing and optimizing complex networked systems with AI by leveraging a human-inspired, multi-agent workflow of LLMs. It combines a front-end, reasoning agents, and an evaluation framework to ground abstract reasoning in empirical data, producing interpretable designs instead of opaque policies. In a case study on LLM-serving in a distributed GPU cluster, Glia derives novel routing, scheduling, and autoscaling algorithms that outperform baselines and transfer to real systems, while adapting to changing workloads. The work demonstrates that structured, reasoning-driven exploration with continuous evaluation can yield creative, robust, and interpretable solutions for challenging systems problems.

Abstract

Can an AI autonomously design mechanisms for computer systems on par with the creativity and reasoning of human experts? We present Glia, an AI architecture for networked systems design that uses large language models (LLMs) in a human-inspired, multi-agent workflow. Each agent specializes in reasoning, experimentation, and analysis, collaborating through an evaluation framework that grounds abstract reasoning in empirical feedback. Unlike prior ML-for-systems methods that optimize black-box policies, Glia generates interpretable designs and exposes its reasoning process. When applied to a distributed GPU cluster for LLM inference, it produces new algorithms for request routing, scheduling, and auto-scaling that perform at human-expert levels in significantly less time, while yielding novel insights into workload behavior. Our results suggest that by combining reasoning LLMs with structured experimentation, an AI can produce creative and understandable designs for complex systems problems.

Paper Structure

This paper contains 27 sections, 1 equation, 14 figures, 1 table.

Figures (14)

  • Figure 1: Illustrative pipeline of request routing for LLM inference.
  • Figure 2: Distribution of mean request completion times for 100 programs generated by directly prompting the LLM.
  • Figure 3: Performance of SCG and MCG Glia against other algorithms and baselines.
  • Figure 4: Glia's GPU cost reductions as we progressively use it across the inference stack.
  • Figure 5: Glia's discovered routing algorithm (AIScheduler in the figure) outperforms baselines in cloud experiments. The trends observed in the cloud experiments are similar to simulation though the numbers aren’t identical.
  • ...and 9 more figures