Table of Contents
Fetching ...

DKE-Research at SemEval-2024 Task 2: Incorporating Data Augmentation with Generative Models and Biomedical Knowledge to Enhance Inference Robustness

Yuqi Wang, Zeqiang Wang, Wei Wang, Qi Chen, Kaizhu Huang, Anh Nguyen, Suparna De

TL;DR

The paper tackles unreliable inferences in biomedical NLI for clinical trial reports by introducing three data-augmentation strategies—numerical question answering, semantic perturbations, and domain-tailored vocabulary substitutions—alongside multi-task learning with DeBERTa. Using GPT-3.5 as a generator and biomedical knowledge graphs for terminology rejuvenation, the approach improves faithfulness and consistency against controlled interventions on the NLI4CT 2024 benchmark, albeit with a modest trade-off in performance on unaltered data. The study provides ablation insights showing semantic perturbation as the primary driver of robustness and highlights practical considerations like data noise and the potential benefits of knowledge-graph pre-training. Overall, the method offers a path toward more robust, faithful clinical NLI systems with implications for trustworthy automated analysis of CTRs.

Abstract

Safe and reliable natural language inference is critical for extracting insights from clinical trial reports but poses challenges due to biases in large pre-trained language models. This paper presents a novel data augmentation technique to improve model robustness for biomedical natural language inference in clinical trials. By generating synthetic examples through semantic perturbations and domain-specific vocabulary replacement and adding a new task for numerical and quantitative reasoning, we introduce greater diversity and reduce shortcut learning. Our approach, combined with multi-task learning and the DeBERTa architecture, achieved significant performance gains on the NLI4CT 2024 benchmark compared to the original language models. Ablation studies validate the contribution of each augmentation method in improving robustness. Our best-performing model ranked 12th in terms of faithfulness and 8th in terms of consistency, respectively, out of the 32 participants.

DKE-Research at SemEval-2024 Task 2: Incorporating Data Augmentation with Generative Models and Biomedical Knowledge to Enhance Inference Robustness

TL;DR

The paper tackles unreliable inferences in biomedical NLI for clinical trial reports by introducing three data-augmentation strategies—numerical question answering, semantic perturbations, and domain-tailored vocabulary substitutions—alongside multi-task learning with DeBERTa. Using GPT-3.5 as a generator and biomedical knowledge graphs for terminology rejuvenation, the approach improves faithfulness and consistency against controlled interventions on the NLI4CT 2024 benchmark, albeit with a modest trade-off in performance on unaltered data. The study provides ablation insights showing semantic perturbation as the primary driver of robustness and highlights practical considerations like data noise and the potential benefits of knowledge-graph pre-training. Overall, the method offers a path toward more robust, faithful clinical NLI systems with implications for trustworthy automated analysis of CTRs.

Abstract

Safe and reliable natural language inference is critical for extracting insights from clinical trial reports but poses challenges due to biases in large pre-trained language models. This paper presents a novel data augmentation technique to improve model robustness for biomedical natural language inference in clinical trials. By generating synthetic examples through semantic perturbations and domain-specific vocabulary replacement and adding a new task for numerical and quantitative reasoning, we introduce greater diversity and reduce shortcut learning. Our approach, combined with multi-task learning and the DeBERTa architecture, achieved significant performance gains on the NLI4CT 2024 benchmark compared to the original language models. Ablation studies validate the contribution of each augmentation method in improving robustness. Our best-performing model ranked 12th in terms of faithfulness and 8th in terms of consistency, respectively, out of the 32 participants.
Paper Structure (13 sections, 6 equations, 1 figure, 3 tables)

This paper contains 13 sections, 6 equations, 1 figure, 3 tables.

Figures (1)

  • Figure 1: The overall demonstration of the proposed system. The upper part of the demonstration involves the application of data augmentation techniques to entailed statements extracted from the original NLI dataset, leveraging generative artificial intelligence (AI) and biomedical domain knowledge graphs. Specifically, we undertake the following procedures: 1) Transformation of statements into multiple-choice questions accompanied by corresponding answers; 2) Introduction of semantic perturbations to the original entailed statements; 3) Employing a statistical method to identify keywords within the original entailed statements, followed by their substitution with synonyms sourced from the biomedical knowledge graph. In the lower part of the demonstration, we incorporate the original entailed statements, augmented data, and CTRs as training data to develop a classifier based on the DeBERTa architecture.