Table of Contents
Fetching ...

CEGI: Measuring the trade-off between efficiency and carbon emissions for SLMs and VLMs

Abhas Kumar, Kapil Pathak, Rajesh Kavuru, Prabhakar Srinivasan

TL;DR

This work introduces the Carbon Efficient Gain Index (CEGI) to quantify the trade-off between accuracy gains and carbon emissions for small and vision-language models across four tasks: Image Captioning, Visual Question Answering, Dialogue Summarization, and Text-to-SQL. By applying LoRA-based fine-tuning and 4/8-bit quantization to Qwen and LLaMA variants, the study demonstrates that smaller, parameter-efficient models can closely match or even surpass the performance of larger models while dramatically reducing emissions. The Eco2AI framework is used to track lifecycle emissions, and CEGI provides a normalized, cross-model efficiency metric that correlates with human judgments of model utility. The findings suggest that environmental sustainability and strong task performance are not mutually exclusive, challenging the notion that larger models inherently offer better value when emissions are considered. The work offers practical guidance for sustainable AI through LoRA-based fine-tuning and quantization, and contributes a reusable metric for model selection under environmental constraints.

Abstract

This paper analyzes the performance of Small Language Models (SLMs) and Vision Language Models (VLMs) and evaluates the trade-off between model performance and carbon emissions across 4 essential tasks: Image Captioning, Visual Question Answering (VQA), Dialogue Summarization and Text-to-SQL conversion. Various SLMs and VLMs belonging to the Qwen and LLaMA architecture family are chosen and variants based on model size in terms of the number of parameters, quantization level and fine-tuning parameters are evaluated. The model variant's performance and carbon emissions are calculated. To quantify the trade-off between model performance and carbon emissions, we introduce a novel metric called CEGI (Carbon Efficient Gain Index). This metric represents the carbon emission per unit percentage gain per million trainable parameters . This metric provides a normalized measure to compare model's efficiency in terms of performance improvement relative to their environmental cost. The experiment's outcome demonstrates that fine-tuning SLMs and VLMs can achieve performance levels comparable to Large Language Models (LLMs) while producing significantly less carbon emissions. Our findings suggest that the marginal gains in accuracy from larger models do not justify the substantial increase in carbon emissions. Leveraging lower-bit quantization levels, the proposed metric further enhances energy efficiency without compromising performance. This study highlights balancing high performance and environmental sustainability. It offers a valuable metric for selecting models suitable for environmentally-friendly AI development.

CEGI: Measuring the trade-off between efficiency and carbon emissions for SLMs and VLMs

TL;DR

This work introduces the Carbon Efficient Gain Index (CEGI) to quantify the trade-off between accuracy gains and carbon emissions for small and vision-language models across four tasks: Image Captioning, Visual Question Answering, Dialogue Summarization, and Text-to-SQL. By applying LoRA-based fine-tuning and 4/8-bit quantization to Qwen and LLaMA variants, the study demonstrates that smaller, parameter-efficient models can closely match or even surpass the performance of larger models while dramatically reducing emissions. The Eco2AI framework is used to track lifecycle emissions, and CEGI provides a normalized, cross-model efficiency metric that correlates with human judgments of model utility. The findings suggest that environmental sustainability and strong task performance are not mutually exclusive, challenging the notion that larger models inherently offer better value when emissions are considered. The work offers practical guidance for sustainable AI through LoRA-based fine-tuning and quantization, and contributes a reusable metric for model selection under environmental constraints.

Abstract

This paper analyzes the performance of Small Language Models (SLMs) and Vision Language Models (VLMs) and evaluates the trade-off between model performance and carbon emissions across 4 essential tasks: Image Captioning, Visual Question Answering (VQA), Dialogue Summarization and Text-to-SQL conversion. Various SLMs and VLMs belonging to the Qwen and LLaMA architecture family are chosen and variants based on model size in terms of the number of parameters, quantization level and fine-tuning parameters are evaluated. The model variant's performance and carbon emissions are calculated. To quantify the trade-off between model performance and carbon emissions, we introduce a novel metric called CEGI (Carbon Efficient Gain Index). This metric represents the carbon emission per unit percentage gain per million trainable parameters . This metric provides a normalized measure to compare model's efficiency in terms of performance improvement relative to their environmental cost. The experiment's outcome demonstrates that fine-tuning SLMs and VLMs can achieve performance levels comparable to Large Language Models (LLMs) while producing significantly less carbon emissions. Our findings suggest that the marginal gains in accuracy from larger models do not justify the substantial increase in carbon emissions. Leveraging lower-bit quantization levels, the proposed metric further enhances energy efficiency without compromising performance. This study highlights balancing high performance and environmental sustainability. It offers a valuable metric for selecting models suitable for environmentally-friendly AI development.

Paper Structure

This paper contains 32 sections, 7 equations, 9 figures, 20 tables.

Figures (9)

  • Figure 1: Performance comparison for image captioning tasks using SPICE score. The plot illustrates the performance of the base model ($B_M$), fine-tuned model ($F_T$), and GPT-4o as the baseline. Fine-tuned models demonstrate superior SPICE scores across configurations.
  • Figure 2: Performance comparison(on $e^{3x}$ scale) for Visual QA tasks using BLEU score. The radar chart highlights the improvements achieved by the fine-tuned model ($F_T$) compared to the base model ($B_M$) and GPT-4o baseline, demonstrating substantial performance gains in fine-tuned configurations.
  • Figure 3: Comparison of ROUGE-1 scores for $B_M$, $F_T$, and GPT-4o in Dialogue Summarization. Fine-tuned models demonstrate substantial improvements over the base models and GPT-4o baseline, showcasing their efficacy in generating summaries with higher semantic relevance.
  • Figure 4: Comparison of Execution Accuracy (EA) scores for $B_M$, $F_T$, and GPT-4o in Text-to-SQL tasks. Fine-tuned models achieve significant accuracy gains over corresponding $B_M$.
  • Figure 5: Comparison of SPICE scores and carbon emissions for $B_M$, $F_T$ for Image Captioning.
  • ...and 4 more figures