Table of Contents
Fetching ...

Energy-Aware Data-Driven Model Selection in LLM-Orchestrated AI Systems

Daria Smirnova, Hamid Nasiri, Marta Adamska, Zhengxin Yu, Peter Garraghan

TL;DR

Problem: LLM-based orchestrators rely on qualitative model descriptions, leading to suboptimal model selection, higher energy consumption, and latency. Approach: GUIDE introduces an energy-aware data-driven framework with an Energy Budget Tracker and Pareto-based Model Selector to integrate quantitative performance-energy metrics into per-slot decisions. Contributions: empirical analysis of current limitations; a concrete framework that achieves higher accuracy, up to 54% energy efficiency gains, and latency reductions from 4.51 seconds to 7.2 milliseconds; and demonstration of Pareto-consistent model selections under real-time energy constraints. Impact: enables scalable, cost-efficient orchestration in multi-model AI systems.

Abstract

As modern artificial intelligence (AI) systems become more advanced and capable, they can leverage a wide range of tools and models to perform complex tasks. Today, the task of orchestrating these models is often performed by Large Language Models (LLMs) that rely on qualitative descriptions of models for decision-making. However, the descriptions provided to these LLM-based orchestrators do not reflect true model capabilities and performance characteristics, leading to suboptimal model selection, reduced accuracy, and increased energy costs. In this paper, we conduct an empirical analysis of LLM-based orchestration limitations and propose GUIDE, a new energy-aware model selection framework that accounts for performance-energy trade-offs by incorporating quantitative model performance characteristics in decision-making. Experimental results demonstrate that GUIDE increases accuracy by 0.90%-11.92% across various evaluated tasks, and achieves up to 54% energy efficiency improvement, while reducing orchestrator model selection latency from 4.51 s to 7.2 ms.

Energy-Aware Data-Driven Model Selection in LLM-Orchestrated AI Systems

TL;DR

Problem: LLM-based orchestrators rely on qualitative model descriptions, leading to suboptimal model selection, higher energy consumption, and latency. Approach: GUIDE introduces an energy-aware data-driven framework with an Energy Budget Tracker and Pareto-based Model Selector to integrate quantitative performance-energy metrics into per-slot decisions. Contributions: empirical analysis of current limitations; a concrete framework that achieves higher accuracy, up to 54% energy efficiency gains, and latency reductions from 4.51 seconds to 7.2 milliseconds; and demonstration of Pareto-consistent model selections under real-time energy constraints. Impact: enables scalable, cost-efficient orchestration in multi-model AI systems.

Abstract

As modern artificial intelligence (AI) systems become more advanced and capable, they can leverage a wide range of tools and models to perform complex tasks. Today, the task of orchestrating these models is often performed by Large Language Models (LLMs) that rely on qualitative descriptions of models for decision-making. However, the descriptions provided to these LLM-based orchestrators do not reflect true model capabilities and performance characteristics, leading to suboptimal model selection, reduced accuracy, and increased energy costs. In this paper, we conduct an empirical analysis of LLM-based orchestration limitations and propose GUIDE, a new energy-aware model selection framework that accounts for performance-energy trade-offs by incorporating quantitative model performance characteristics in decision-making. Experimental results demonstrate that GUIDE increases accuracy by 0.90%-11.92% across various evaluated tasks, and achieves up to 54% energy efficiency improvement, while reducing orchestrator model selection latency from 4.51 s to 7.2 ms.

Paper Structure

This paper contains 16 sections, 5 figures, 7 tables, 2 algorithms.

Figures (5)

  • Figure 1: Overview of an LLM-orchestrated AI system.
  • Figure 2: Overview of the energy-aware LLM-orchestrator model selection framework. The Energy Budget Tracker (right) estimates the current per-slot energy budget based on GPU energy deltas using EMA and the user-defined energy cap. The Model Selector (middle) filters models by task and budget, applies a Pareto filter on accuracy–energy, and returns the most accurate model from the Pareto efficient subset. The System Orchestrator (left) executes the selected model and returns the response to the user.
  • Figure 3: Accuracy per Joule, calculated on weighted energy and accuracy results, on ICapt, VQA and OD task types for each method: GUIDE model selection (with two GUIDE level settings) and two comparable model selection methods - JARVIS and Name-Only. Error bars represent 95% CI.
  • Figure 4: Model selection performance on the VQA dataset.
  • Figure 5: Per-slot energy usage. The dotted red line indicates the user-defined energy target; actual realized energy usage in each time slot is shown as blue bars. Includes GPU base draw of 40-50 J/s. The x-axis denotes time slots. Each prompt in a dataset of a certain type can result in the execution of many models, not limited to the dataset's primary type, due to JARVIS multi-step reasoning and task misclassifications, as discussed in Section \ref{['sec:formulation']}.