Table of Contents
Fetching ...

Sustainable Visions: Unsupervised Machine Learning Insights on Global Development Goals

Alberto García-Rodríguez, Matias Núñez, Miguel Robles Pérez, Tzipe Govezensky, Rafael A. Barrio, Carlos Gershenson, Kimmo K. Kaski, Julia Tagüeña

TL;DR

This study tackles the slow global progress toward the UN Sustainable Development Goals by applying a three-stage unsupervised learning pipeline (PCA for global structure, t-SNE for local structure, and DBSCAN for clustering) to SDG indicators from 2000–2022 across 107 countries. It reveals strong inter-goal correlations (notably between Goals 12 and 13, and negative associations with many other goals) and region-specific SDG patterns that cluster countries geographically, highlighting the insufficiency of a uniform global path toward 2030. The results show a persistent gap to the ideal SDG state, accelerated distortions due to COVID-19, and Gaussian-like intra-cluster distance distributions, suggesting that regionally tailored, data-informed policy is necessary. Overall, the work provides a robust, data-driven framework for diagnosing interdependencies, mapping trajectories, and guiding cooperative, targeted strategies for sustainable progress.

Abstract

The 2030 Agenda for Sustainable Development of the United Nations outlines 17 goals for countries of the world to address global challenges in their development. However, the progress of countries towards these goal has been slower than expected and, consequently, there is a need to investigate the reasons behind this fact. In this study, we have used a novel data-driven methodology to analyze time-series data for over 20 years (2000-2022) from 107 countries using unsupervised machine learning (ML) techniques. Our analysis reveals strong positive and negative correlations between certain SDGs (Sustainable Development Goals). Our findings show that progress toward the SDGs is heavily influenced by geographical, cultural and socioeconomic factors, with no country on track to achieve all the goals by 2030. This highlights the need for a region-specific, systemic approach to sustainable development that acknowledges the complex interdependencies between the goals and the variable capacities of countries to reach them. For this our machine learning based approach provides a robust framework for developing efficient and data-informed strategies to promote cooperative and targeted initiatives for sustainable progress.

Sustainable Visions: Unsupervised Machine Learning Insights on Global Development Goals

TL;DR

This study tackles the slow global progress toward the UN Sustainable Development Goals by applying a three-stage unsupervised learning pipeline (PCA for global structure, t-SNE for local structure, and DBSCAN for clustering) to SDG indicators from 2000–2022 across 107 countries. It reveals strong inter-goal correlations (notably between Goals 12 and 13, and negative associations with many other goals) and region-specific SDG patterns that cluster countries geographically, highlighting the insufficiency of a uniform global path toward 2030. The results show a persistent gap to the ideal SDG state, accelerated distortions due to COVID-19, and Gaussian-like intra-cluster distance distributions, suggesting that regionally tailored, data-informed policy is necessary. Overall, the work provides a robust, data-driven framework for diagnosing interdependencies, mapping trajectories, and guiding cooperative, targeted strategies for sustainable progress.

Abstract

The 2030 Agenda for Sustainable Development of the United Nations outlines 17 goals for countries of the world to address global challenges in their development. However, the progress of countries towards these goal has been slower than expected and, consequently, there is a need to investigate the reasons behind this fact. In this study, we have used a novel data-driven methodology to analyze time-series data for over 20 years (2000-2022) from 107 countries using unsupervised machine learning (ML) techniques. Our analysis reveals strong positive and negative correlations between certain SDGs (Sustainable Development Goals). Our findings show that progress toward the SDGs is heavily influenced by geographical, cultural and socioeconomic factors, with no country on track to achieve all the goals by 2030. This highlights the need for a region-specific, systemic approach to sustainable development that acknowledges the complex interdependencies between the goals and the variable capacities of countries to reach them. For this our machine learning based approach provides a robust framework for developing efficient and data-informed strategies to promote cooperative and targeted initiatives for sustainable progress.
Paper Structure (21 sections, 5 equations, 11 figures, 2 tables)

This paper contains 21 sections, 5 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: A three stage unsupervised learning pipeline for data analysis. The first stage involves data preprocessing, where the dataset undergoes standardization, normalization, and cleaning procedures to address missing values and noise. The second stage consists of sequential dimensionality reduction by PCA to identify principal components to capture global data structure, followed by t-SNE which preserves local relationships in the reduced dimensional space. The final stage applies a clustering algorithm (DBSCAN) to identify distinct groups within the processed data. Countries in each cluster are mapped and the mean trajectories towards ideal scores are calculated. Arrows between the stages indicate the sequential flow of data through the pipeline, with the output of each stage serving as the input for the subsequent stage.
  • Figure 2: Parallel plot of the 17 SDG. Comparison among all the yearly average SDG scores, from 2000 to 2022. Scale color goes from dark red (2000) to light blue (2022).
  • Figure 3: Principal Component Analysis (PCA) of Global SDG Progress from 2000–2022. This two-dimensional plot represents the positioning of countries concerning the Sustainable Development Goals. Lines link between points in different years for a country. Oscillating paths suggest varying rates of advancement. The black dot in the lower right corner represents the ideal goal for 2030.
  • Figure 4: PCA with SDG Vectors. An enhanced view of the PCA plot introduces vectors representing each of the 17 Sustainable Development Goals. The orientation and magnitude of these vectors offer insights into the influence and correlation of individual goals within the principal components.
  • Figure 5: t-SNE Visualization with DBSCAN Clustering. A detailed t-SNE plot colored by clusters determined using the DBSCAN algorithm. We use transparency for most dots to reduce visual clutter, highlighting only those that change between clusters with full color. These changes are further emphasized by connecting the highlighted dots with dotted lines. The countries in each clusters are enumerated in S1 Table 2.1 (in Supplementary material S1 section)
  • ...and 6 more figures