Table of Contents
Fetching ...

Predicting Sustainable Development Goals Using Course Descriptions -- from LLMs to Conventional Foundation Models

Lev Kharlashkin, Melany Macias, Leo Huovinen, Mika Hämäläinen

TL;DR

The paper addresses predicting UN SDGs for university courses by generating training data from noisy course descriptions with PaLM 2 and then fine-tuning smaller foundation models for SDG prediction. It evaluates five transformer models (BERT, mBERT, RoBERTa, XLM-RoBERTa, BART) on a 70/15/15 split, finding BART to achieve the best F1-score of 0.786, demonstrating a viable, privacy-conscious pipeline for curricular SDG alignment. The approach contributes a scalable methodology for integrating SDGs into higher education, while highlighting data-imbalance challenges that affect certain Goals. The results offer practical implications for universities to monitor and report SDG coverage in curricula while preserving data privacy and enabling multilingual extensions.

Abstract

We present our work on predicting United Nations sustainable development goals (SDG) for university courses. We use an LLM named PaLM 2 to generate training data given a noisy human-authored course description input as input. We use this data to train several different smaller language models to predict SDGs for university courses. This work contributes to better university level adaptation of SDGs. The best performing model in our experiments was BART with an F1-score of 0.786.

Predicting Sustainable Development Goals Using Course Descriptions -- from LLMs to Conventional Foundation Models

TL;DR

The paper addresses predicting UN SDGs for university courses by generating training data from noisy course descriptions with PaLM 2 and then fine-tuning smaller foundation models for SDG prediction. It evaluates five transformer models (BERT, mBERT, RoBERTa, XLM-RoBERTa, BART) on a 70/15/15 split, finding BART to achieve the best F1-score of 0.786, demonstrating a viable, privacy-conscious pipeline for curricular SDG alignment. The approach contributes a scalable methodology for integrating SDGs into higher education, while highlighting data-imbalance challenges that affect certain Goals. The results offer practical implications for universities to monitor and report SDG coverage in curricula while preserving data privacy and enabling multilingual extensions.

Abstract

We present our work on predicting United Nations sustainable development goals (SDG) for university courses. We use an LLM named PaLM 2 to generate training data given a noisy human-authored course description input as input. We use this data to train several different smaller language models to predict SDGs for university courses. This work contributes to better university level adaptation of SDGs. The best performing model in our experiments was BART with an F1-score of 0.786.
Paper Structure (9 sections, 3 figures, 3 tables)

This paper contains 9 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Distribution of courses per degree after the initial cleaning step.
  • Figure 2: Distribution of SDG mentions within the training dataset.
  • Figure 3: F1 Scores by SDG for Each Model