Abex-rat: Synergizing Abstractive Augmentation and Adversarial Training for Classification of Occupational Accident Reports

Jian Chen; Jiabao Dou

Abex-rat: Synergizing Abstractive Augmentation and Adversarial Training for Classification of Occupational Accident Reports

Jian Chen, Jiabao Dou

TL;DR

The paper tackles severe class imbalance and data scarcity in occupational accident report classification by introducing ABEX-RAT, a resource-efficient framework that pairs ABEX data augmentation with random adversarial training. It uses a prompt-guided abstraction to distill label-critical semantics, followed by diversity-driven expansion to synthesize minority-class samples, and a fixed embedding extractor with a lightweight RAT classifier. Empirical results on the OSHA dataset show state-of-the-art Macro-F1 of 90.32% and Weighted-F1 of 92.82%, outperforming traditional baselines and large-model fine-tuning while maintaining efficiency. The approach demonstrates that targeted data enrichment combined with robust regularization can achieve high accuracy in specialized domains without costly full-parameter LLM fine-tuning.

Abstract

The automatic classification of occupational accident reports is pivotal for workplace safety analysis but is persistently hindered by severe class imbalance and data scarcity. In this paper, we propose ABEX-RAT, a resource-efficient framework that synergizes generative data augmentation with robust adversarial learning. Unlike computationally expensive large language models (LLMs) fine-tuning, our approach employs a two-stage abstractive-expansive (ABEX) pipeline: it first utilizes a prompt-guided LLM to distill label-critical semantics into concise abstracts, which are then expanded into diverse synthetic samples to balance the data distribution. Subsequently, we train a lightweight classifier using a random adversarial training (RAT) protocol, which stochastically injects perturbations to enhance generalization without significant computational overhead. Experimental results on the OSHA dataset demonstrate that ABEXRAT establishes a new state-of-the-art, achieving a Macro-F1 score of 90.32% and significantly outperforming both traditional baselines and fine-tuned large models. This confirms that targeted augmentation combined with robust training offers a superior, data-efficient alternative for specialized domain classification. The source code will be made publicly available upon acceptance.

Abex-rat: Synergizing Abstractive Augmentation and Adversarial Training for Classification of Occupational Accident Reports

TL;DR

Abstract

Abex-rat: Synergizing Abstractive Augmentation and Adversarial Training for Classification of Occupational Accident Reports

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)