AIMS.au: A Dataset for the Analysis of Modern Slavery Countermeasures in Corporate Statements

Adriana Eufrosina Bora; Pierre-Luc St-Charles; Mirko Bronzi; Arsène Fansi Tchango; Bruno Rousseau; Kerrie Mengersen

AIMS.au: A Dataset for the Analysis of Modern Slavery Countermeasures in Corporate Statements

Adriana Eufrosina Bora, Pierre-Luc St-Charles, Mirko Bronzi, Arsène Fansi Tchango, Bruno Rousseau, Kerrie Mengersen

TL;DR

This paper presents AIMS.au, the largest sentence-level dataset of Australian Modern Slavery Act statements annotated to identify sentences containing mandated information. It details meticulous annotation guidelines, preprocessing, and gold-standard subsets, and demonstrates a machine learning pipeline for detecting Act-relevant sentences, evaluated under zero-shot and supervised settings. Results show fine-tuned models substantially surpass zero-shot large language models, with context and careful annotation driving performance. The dataset's open release and alignment with other legal contexts suggest strong practical impact for automated compliance monitoring and cross-jurisdiction research in supply chain transparency.

Abstract

Despite over a decade of legislative efforts to address modern slavery in the supply chains of large corporations, the effectiveness of government oversight remains hampered by the challenge of scrutinizing thousands of statements annually. While Large Language Models (LLMs) can be considered a well established solution for the automatic analysis and summarization of documents, recognizing concrete modern slavery countermeasures taken by companies and differentiating those from vague claims remains a challenging task. To help evaluate and fine-tune LLMs for the assessment of corporate statements, we introduce a dataset composed of 5,731 modern slavery statements taken from the Australian Modern Slavery Register and annotated at the sentence level. This paper details the construction steps for the dataset that include the careful design of annotation specifications, the selection and preprocessing of statements, and the creation of high-quality annotation subsets for effective model evaluations. To demonstrate our dataset's utility, we propose a machine learning methodology for the detection of sentences relevant to mandatory reporting requirements set by the Australian Modern Slavery Act. We then follow this methodology to benchmark modern language models under zero-shot and supervised learning settings.

AIMS.au: A Dataset for the Analysis of Modern Slavery Countermeasures in Corporate Statements

TL;DR

Abstract

AIMS.au: A Dataset for the Analysis of Modern Slavery Countermeasures in Corporate Statements

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)