ESNLIR: A Spanish Multi-Genre Dataset with Causal Relationships
Johan R. Portela, Nicolás Perez, Rubén Manrique
TL;DR
ESNLIR introduces a large Spanish multi-genre NLI dataset focused on causal reasoning, constructed via linking-phrase extraction across 34 corpora spanning eight genres. The dataset enables automatic premise-hypothesis pairing with a ScoNLI-inspired methodology and includes comprehensive baselines using XGBoost, BERTIN, and XLM-RoBERTa, along with artifact-detection, stress-testing, and human validation. Results show that multi-genre data improves generalization, with XLM-RoBERTa achieving the strongest performance (accuracy and macro-F1 above 0.67) and the dataset demonstrating robustness to annotation artifacts while revealing genre-specific challenges. The work highlights the potential of genre diversity for Spanish NLI, sets a baseline for future LLM evaluation, and paves the way for more extensive annotation and generalization studies.
Abstract
Natural Language Inference (NLI), also known as Recognizing Textual Entailment (RTE), serves as a crucial area within the domain of Natural Language Processing (NLP). This area fundamentally empowers machines to discern semantic relationships between assorted sections of text. Even though considerable work has been executed for the English language, it has been observed that efforts for the Spanish language are relatively sparse. Keeping this in view, this paper focuses on generating a multi-genre Spanish dataset for NLI, ESNLIR, particularly accounting for causal Relationships. A preliminary baseline has been conceptualized and subjected to an evaluation, leveraging models drawn from the BERT family. The findings signify that the enrichment of genres essentially contributes to the enrichment of the model's capability to generalize. The code, notebooks and whole datasets for this experiments is available at: https://zenodo.org/records/15002575. If you are interested only in the dataset you can find it here: https://zenodo.org/records/15002371.
