Table of Contents
Fetching ...

ExioML: Eco-economic dataset for Machine Learning in Global Sectoral Sustainability

Yanming Guo, Charles Guan, Jin Ma

TL;DR

ExioML addresses the lack of open ML-ready benchmarks for environmentally extended multi-regional input-output (EE-MRIO) analysis by introducing a high-resolution benchmark built on ExioBase 3.8.2, featuring two data modalities: $PxP$ (200 products) and $IxI$ (163 industries) across 49 regions from 1995 to 2022. It enables graph- and tabular-based ML with GPU-accelerated footprint calculations and an open toolkit for flexible factor selection, and it validates usability via a sectoral $GHG$ emissions regression achieving low $\mathrm{MSE}$. Deep models (e.g., GANDALF) generally outperform shallow baselines, with RF/GBDT offering competitive performance at lower compute costs, establishing a robust baseline for future EE-ML research. By reducing data access barriers and providing reproducible, scalable MRIO contexts, ExioML aims to foster climate action insights and sustainable investment decisions through interdisciplinary ML applications.

Abstract

The Environmental Extended Multi-Regional Input-Output analysis is the predominant framework in Ecological Economics for assessing the environmental impact of economic activities. This paper introduces ExioML, the first Machine Learning benchmark dataset designed for sustainability analysis, aimed at lowering barriers and fostering collaboration between Machine Learning and Ecological Economics research. A crucial greenhouse gas emission regression task was conducted to evaluate sectoral sustainability and demonstrate the usability of the dataset. We compared the performance of traditional shallow models with deep learning models, utilizing a diverse Factor Accounting table and incorporating various categorical and numerical features. Our findings reveal that ExioML, with its high usability, enables deep and ensemble models to achieve low mean square errors, establishing a baseline for future Machine Learning research. Through ExioML, we aim to build a foundational dataset supporting various Machine Learning applications and promote climate actions and sustainable investment decisions.

ExioML: Eco-economic dataset for Machine Learning in Global Sectoral Sustainability

TL;DR

ExioML addresses the lack of open ML-ready benchmarks for environmentally extended multi-regional input-output (EE-MRIO) analysis by introducing a high-resolution benchmark built on ExioBase 3.8.2, featuring two data modalities: (200 products) and (163 industries) across 49 regions from 1995 to 2022. It enables graph- and tabular-based ML with GPU-accelerated footprint calculations and an open toolkit for flexible factor selection, and it validates usability via a sectoral emissions regression achieving low . Deep models (e.g., GANDALF) generally outperform shallow baselines, with RF/GBDT offering competitive performance at lower compute costs, establishing a robust baseline for future EE-ML research. By reducing data access barriers and providing reproducible, scalable MRIO contexts, ExioML aims to foster climate action insights and sustainable investment decisions through interdisciplinary ML applications.

Abstract

The Environmental Extended Multi-Regional Input-Output analysis is the predominant framework in Ecological Economics for assessing the environmental impact of economic activities. This paper introduces ExioML, the first Machine Learning benchmark dataset designed for sustainability analysis, aimed at lowering barriers and fostering collaboration between Machine Learning and Ecological Economics research. A crucial greenhouse gas emission regression task was conducted to evaluate sectoral sustainability and demonstrate the usability of the dataset. We compared the performance of traditional shallow models with deep learning models, utilizing a diverse Factor Accounting table and incorporating various categorical and numerical features. Our findings reveal that ExioML, with its high usability, enables deep and ensemble models to achieve low mean square errors, establishing a baseline for future Machine Learning research. Through ExioML, we aim to build a foundational dataset supporting various Machine Learning applications and promote climate actions and sustainable investment decisions.
Paper Structure (7 sections, 6 equations, 5 figures, 5 tables)

This paper contains 7 sections, 6 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Environmental Extended Multi-Regional Input-Output (EE-MRIO) data are represented in a high-dimensional matrix format, tracking the monetary resource transfers between international sectors using input-output and demand matrices. Additionally, it accounts for the environmental impacts of economic activities through the Factor Accounting table.
  • Figure 2: Architecture of ExioML system derived from the open-source EE-MRIO database, ExioBase 3.8.2. Each colour indicates an Eco-economics factor: value added, employment, energy consumption and GHG emission. The system contains Factor Accounting data describing heterogeneous sector features. The Footprint Networks model the global trading network tracking resource transfer within sectors. The data is presented into 2 categories: 200 products and 163 industries for 49 regions from 1995 to 2022 in the PxP and IxI datasets.
  • Figure 3: Dynamic Global Trade Footprint Networks from 1995 to 2022. This visualization employs a circular layout to depict the evolving trade patterns over time. Nodes represent sectors indicated by colours. Edges indicate resource transfers, with colours reflecting the source regions. The diagram highlights significant shifts in primary trade sources and target sectors, illustrating the dynamic structural changes in the global trading network.
  • Figure 4: Boxplots for top 10 sectors with largest value-added, employment, GHG emission and energy usage in PxP Factor Accounting table. The x-axis represents indicators transformed using a logarithmic scale, while the y-axis lists sector names. These boxplots reveal the high skewness in sector distributions across different regions.
  • Figure 5: The diagonal subfigures in the pair plot display the sectoral distribution with respect to four selected factors. Off-diagonal subfigures present the pairwise scatter plots, where each point represents a sector and is color-coded by region. The plot reveals high relative pairwise correlations among these factors.