Energy-Aware Data-Driven Model Selection in LLM-Orchestrated AI Systems
Daria Smirnova, Hamid Nasiri, Marta Adamska, Zhengxin Yu, Peter Garraghan
TL;DR
Problem: LLM-based orchestrators rely on qualitative model descriptions, leading to suboptimal model selection, higher energy consumption, and latency. Approach: GUIDE introduces an energy-aware data-driven framework with an Energy Budget Tracker and Pareto-based Model Selector to integrate quantitative performance-energy metrics into per-slot decisions. Contributions: empirical analysis of current limitations; a concrete framework that achieves higher accuracy, up to 54% energy efficiency gains, and latency reductions from 4.51 seconds to 7.2 milliseconds; and demonstration of Pareto-consistent model selections under real-time energy constraints. Impact: enables scalable, cost-efficient orchestration in multi-model AI systems.
Abstract
As modern artificial intelligence (AI) systems become more advanced and capable, they can leverage a wide range of tools and models to perform complex tasks. Today, the task of orchestrating these models is often performed by Large Language Models (LLMs) that rely on qualitative descriptions of models for decision-making. However, the descriptions provided to these LLM-based orchestrators do not reflect true model capabilities and performance characteristics, leading to suboptimal model selection, reduced accuracy, and increased energy costs. In this paper, we conduct an empirical analysis of LLM-based orchestration limitations and propose GUIDE, a new energy-aware model selection framework that accounts for performance-energy trade-offs by incorporating quantitative model performance characteristics in decision-making. Experimental results demonstrate that GUIDE increases accuracy by 0.90%-11.92% across various evaluated tasks, and achieves up to 54% energy efficiency improvement, while reducing orchestrator model selection latency from 4.51 s to 7.2 ms.
