Table of Contents
Fetching ...

Enhancing ESG Impact Type Identification through Early Fusion and Multilingual Models

Hariram Veeramani, Surendrabikram Thapa, Usman Naseem

TL;DR

This paper tackles ESG impact type identification from text in a multilingual setting using the ML-ESG-2 shared task dataset. It proposes an ensemble system combining mBERT, FlauBERT-base, ALBERT-base-v2, and an MLP utilizing LSA and TF-IDF, with both late and early fusion strategies. The early fusion configuration that integrates TF-IDF, LSA, and language-model embeddings achieves the best results, notably a micro F1 around 0.9633 on English, while performance is lower for French, Japanese, and Chinese. The work demonstrates the feasibility of multilingual ESG information extraction and highlights the need for language-specific fine-tuning and richer multilingual resources for broader coverage.

Abstract

In the evolving landscape of Environmental, Social, and Corporate Governance (ESG) impact assessment, the ML-ESG-2 shared task proposes identifying ESG impact types. To address this challenge, we present a comprehensive system leveraging ensemble learning techniques, capitalizing on early and late fusion approaches. Our approach employs four distinct models: mBERT, FlauBERT-base, ALBERT-base-v2, and a Multi-Layer Perceptron (MLP) incorporating Latent Semantic Analysis (LSA) and Term Frequency-Inverse Document Frequency (TF-IDF) features. Through extensive experimentation, we find that our early fusion ensemble approach, featuring the integration of LSA, TF-IDF, mBERT, FlauBERT-base, and ALBERT-base-v2, delivers the best performance. Our system offers a comprehensive ESG impact type identification solution, contributing to the responsible and sustainable decision-making processes vital in today's financial and corporate governance landscape.

Enhancing ESG Impact Type Identification through Early Fusion and Multilingual Models

TL;DR

This paper tackles ESG impact type identification from text in a multilingual setting using the ML-ESG-2 shared task dataset. It proposes an ensemble system combining mBERT, FlauBERT-base, ALBERT-base-v2, and an MLP utilizing LSA and TF-IDF, with both late and early fusion strategies. The early fusion configuration that integrates TF-IDF, LSA, and language-model embeddings achieves the best results, notably a micro F1 around 0.9633 on English, while performance is lower for French, Japanese, and Chinese. The work demonstrates the feasibility of multilingual ESG information extraction and highlights the need for language-specific fine-tuning and richer multilingual resources for broader coverage.

Abstract

In the evolving landscape of Environmental, Social, and Corporate Governance (ESG) impact assessment, the ML-ESG-2 shared task proposes identifying ESG impact types. To address this challenge, we present a comprehensive system leveraging ensemble learning techniques, capitalizing on early and late fusion approaches. Our approach employs four distinct models: mBERT, FlauBERT-base, ALBERT-base-v2, and a Multi-Layer Perceptron (MLP) incorporating Latent Semantic Analysis (LSA) and Term Frequency-Inverse Document Frequency (TF-IDF) features. Through extensive experimentation, we find that our early fusion ensemble approach, featuring the integration of LSA, TF-IDF, mBERT, FlauBERT-base, and ALBERT-base-v2, delivers the best performance. Our system offers a comprehensive ESG impact type identification solution, contributing to the responsible and sustainable decision-making processes vital in today's financial and corporate governance landscape.
Paper Structure (15 sections, 2 figures, 3 tables)

This paper contains 15 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Late fusion technique that uses logits from all models to make the final decision.
  • Figure 2: Early fusion ensemble takes the different representations and uses MLP for the final classification.