DFKI-NLP at SemEval-2024 Task 2: Towards Robust LLMs Using Data Perturbations and MinMax Training
Bhuvanesh Verma, Lisa Raithel
TL;DR
This work tackles robust natural language inference for Clinical Trial Reports (NLI4CT) by combining instruction-tuned LLMs with a MinMax auxiliary model and data perturbations focused on acronyms and numerical values. The Mistral 7B–based system is augmented with LoRA PEFT and MedNLI pre-finetuning to enhance biomedical alignment, and a dedicated auxiliary learner directs the model toward hard examples. Perturbations reveal that acronym adjustments improve semantic-definitional cases while numerical perturbations influence semantic-preserving interventions; the combined approach yields mixed effects across interventions and sections, with strong performance for Adverse Events and numerical contradictions ($F_1$ up to 0.93). Overall, the MinMax-based robustness framework provides meaningful gains in Faithfulness and Consistency, offering practical pathways to safer, more reliable clinical NLP systems. The analysis of easy vs hard samples and section-level difficulties informs future work on numerical reasoning and data-cartography–driven curriculum design in biomedical NLI.
Abstract
The NLI4CT task at SemEval-2024 emphasizes the development of robust models for Natural Language Inference on Clinical Trial Reports (CTRs) using large language models (LLMs). This edition introduces interventions specifically targeting the numerical, vocabulary, and semantic aspects of CTRs. Our proposed system harnesses the capabilities of the state-of-the-art Mistral model, complemented by an auxiliary model, to focus on the intricate input space of the NLI4CT dataset. Through the incorporation of numerical and acronym-based perturbations to the data, we train a robust system capable of handling both semantic-altering and numerical contradiction interventions. Our analysis on the dataset sheds light on the challenging sections of the CTRs for reasoning.
