Glia: A Human-Inspired AI for Automated Systems Design and Optimization
Pouya Hamadanian, Pantea Karimi, Arash Nasr-Esfahany, Kimia Noorbakhsh, Joseph Chandler, Ali ParandehGheibi, Mohammad Alizadeh, Hari Balakrishnan
TL;DR
Glia tackles the problem of designing and optimizing complex networked systems with AI by leveraging a human-inspired, multi-agent workflow of LLMs. It combines a front-end, reasoning agents, and an evaluation framework to ground abstract reasoning in empirical data, producing interpretable designs instead of opaque policies. In a case study on LLM-serving in a distributed GPU cluster, Glia derives novel routing, scheduling, and autoscaling algorithms that outperform baselines and transfer to real systems, while adapting to changing workloads. The work demonstrates that structured, reasoning-driven exploration with continuous evaluation can yield creative, robust, and interpretable solutions for challenging systems problems.
Abstract
Can an AI autonomously design mechanisms for computer systems on par with the creativity and reasoning of human experts? We present Glia, an AI architecture for networked systems design that uses large language models (LLMs) in a human-inspired, multi-agent workflow. Each agent specializes in reasoning, experimentation, and analysis, collaborating through an evaluation framework that grounds abstract reasoning in empirical feedback. Unlike prior ML-for-systems methods that optimize black-box policies, Glia generates interpretable designs and exposes its reasoning process. When applied to a distributed GPU cluster for LLM inference, it produces new algorithms for request routing, scheduling, and auto-scaling that perform at human-expert levels in significantly less time, while yielding novel insights into workload behavior. Our results suggest that by combining reasoning LLMs with structured experimentation, an AI can produce creative and understandable designs for complex systems problems.
