Table of Contents
Fetching ...

ESG Classification by Implicit Rule Learning via GPT-4

Hyo Jeong Yun, Chanyoung Kim, Moonjeong Hahm, Kyuri Kim, Guijin Son

TL;DR

This work investigates whether GPT-4 can implicitly align with unknown ESG evaluation criteria without explicit training data by leveraging prompting, chain-of-thought reasoning, and dynamic in-context learning. It evaluates on the Korean subset of Shared Task ML-ESG-3, achieving 2nd place in Impact Type through a 5-shot, MSCI-guided prompting setup and analyzes performance on smaller open models. The results indicate that longer general pre-training correlates with improved performance on financial downstream tasks and that model calibration remains robust across prompting variations, highlighting training-free pathways for cross-language ESG reasoning. Overall, the study demonstrates the potential of LLMs to navigate subjective ESG guidelines without explicit training data, with implications for multilingual and training-free financial decision-support tasks.

Abstract

Environmental, social, and governance (ESG) factors are widely adopted as higher investment return indicators. Accordingly, ongoing efforts are being made to automate ESG evaluation with language models to extract signals from massive web text easily. However, recent approaches suffer from a lack of training data, as rating agencies keep their evaluation metrics confidential. This paper investigates whether state-of-the-art language models like GPT-4 can be guided to align with unknown ESG evaluation criteria through strategies such as prompting, chain-of-thought reasoning, and dynamic in-context learning. We demonstrate the efficacy of these approaches by ranking 2nd in the Shared-Task ML-ESG-3 Impact Type track for Korean without updating the model on the provided training data. We also explore how adjusting prompts impacts the ability of language models to address financial tasks leveraging smaller models with openly available weights. We observe longer general pre-training to correlate with enhanced performance in financial downstream tasks. Our findings showcase the potential of language models to navigate complex, subjective evaluation guidelines despite lacking explicit training examples, revealing opportunities for training-free solutions for financial downstream tasks.

ESG Classification by Implicit Rule Learning via GPT-4

TL;DR

This work investigates whether GPT-4 can implicitly align with unknown ESG evaluation criteria without explicit training data by leveraging prompting, chain-of-thought reasoning, and dynamic in-context learning. It evaluates on the Korean subset of Shared Task ML-ESG-3, achieving 2nd place in Impact Type through a 5-shot, MSCI-guided prompting setup and analyzes performance on smaller open models. The results indicate that longer general pre-training correlates with improved performance on financial downstream tasks and that model calibration remains robust across prompting variations, highlighting training-free pathways for cross-language ESG reasoning. Overall, the study demonstrates the potential of LLMs to navigate subjective ESG guidelines without explicit training data, with implications for multilingual and training-free financial decision-support tasks.

Abstract

Environmental, social, and governance (ESG) factors are widely adopted as higher investment return indicators. Accordingly, ongoing efforts are being made to automate ESG evaluation with language models to extract signals from massive web text easily. However, recent approaches suffer from a lack of training data, as rating agencies keep their evaluation metrics confidential. This paper investigates whether state-of-the-art language models like GPT-4 can be guided to align with unknown ESG evaluation criteria through strategies such as prompting, chain-of-thought reasoning, and dynamic in-context learning. We demonstrate the efficacy of these approaches by ranking 2nd in the Shared-Task ML-ESG-3 Impact Type track for Korean without updating the model on the provided training data. We also explore how adjusting prompts impacts the ability of language models to address financial tasks leveraging smaller models with openly available weights. We observe longer general pre-training to correlate with enhanced performance in financial downstream tasks. Our findings showcase the potential of language models to navigate complex, subjective evaluation guidelines despite lacking explicit training examples, revealing opportunities for training-free solutions for financial downstream tasks.
Paper Structure (19 sections, 5 figures, 7 tables)

This paper contains 19 sections, 5 figures, 7 tables.

Figures (5)

  • Figure 1: An example from the ML-ESG dataset. Sentences highlighted in red indicate negative implications for ESG, while those in blue denote positive ESG implications. The gold label for the ESG type of this text is "Opportunity." English translations are added for broader accessibility.
  • Figure 2: An example prompt with one examplar (highlighted in red) and prompts to follow the MSCI guidelines (highlghted in blue). We calculate the chance for the gold answer to follow "the answer is".
  • Figure 3: A confusion matrix analyzing the performance of GPT-4 on the Impact Type subset.
  • Figure 4: A confusion matrix analyzing the performance of GPT-4 on the Impact Duration subset.
  • Figure 5: Relationship between accuracy and confidence of Yi-Ko-6B (circle) and EEVE-Korean-10.8B (triangle) for both subsets.(Red for 'Impact Type' and blue for 'Impact Duration'). Regression analysis exhibits a slope of 0.50.