Table of Contents
Fetching ...

CoLe and LYS at BioASQ MESINESP8 Task: similarity based descriptor assignment in Spanish

Francisco J. Ribadas-Pena, Shuyuan Cao, Elmurod Kuriyozov

TL;DR

The paper addresses automatic assignment of DeCS descriptors to Spanish biomedical abstracts in BioASQ MESINESP8 using a similarity-based pipeline built on an Apache Lucene index. It systematically evaluates how different article representations affect a k-NN multi-label classifier, and augments this with Limited Label Powerset and label-profile approaches to exploit label co-occurrence and semantics. The authors report competitive official results and demonstrate the practicality of conventional IR-based methods for Spanish biomedical indexing, outlining future directions toward domain-specific NLP models and semantic tag integration. The contributions include a thorough evaluation of index-term extraction strategies, a meta-label scheme based on NPMI, and label-profile representations that complement content-based retrieval. The work highlights scalable, low-parameter methods suitable for large-scale, Spanish-language biomedical indexing.

Abstract

In this paper, we describe our participation in the MESINESP Task of the BioASQ biomedical semantic indexing challenge. The participating system follows an approach based solely on conventional information retrieval tools. We have evaluated various alternatives for extracting index terms from IBECS/LILACS documents in order to be stored in an Apache Lucene index. Those indexed representations are queried using the contents of the article to be annotated and a ranked list of candidate labels is created from the retrieved documents. We also have evaluated a sort of limited Label Powerset approach which creates meta-labels joining pairs of DeCS labels with high co-occurrence scores, and an alternative method based on label profile matching. Results obtained in official runs seem to confirm the suitability of this approach for languages like Spanish.

CoLe and LYS at BioASQ MESINESP8 Task: similarity based descriptor assignment in Spanish

TL;DR

The paper addresses automatic assignment of DeCS descriptors to Spanish biomedical abstracts in BioASQ MESINESP8 using a similarity-based pipeline built on an Apache Lucene index. It systematically evaluates how different article representations affect a k-NN multi-label classifier, and augments this with Limited Label Powerset and label-profile approaches to exploit label co-occurrence and semantics. The authors report competitive official results and demonstrate the practicality of conventional IR-based methods for Spanish biomedical indexing, outlining future directions toward domain-specific NLP models and semantic tag integration. The contributions include a thorough evaluation of index-term extraction strategies, a meta-label scheme based on NPMI, and label-profile representations that complement content-based retrieval. The work highlights scalable, low-parameter methods suitable for large-scale, Spanish-language biomedical indexing.

Abstract

In this paper, we describe our participation in the MESINESP Task of the BioASQ biomedical semantic indexing challenge. The participating system follows an approach based solely on conventional information retrieval tools. We have evaluated various alternatives for extracting index terms from IBECS/LILACS documents in order to be stored in an Apache Lucene index. Those indexed representations are queried using the contents of the article to be annotated and a ranked list of candidate labels is created from the retrieved documents. We also have evaluated a sort of limited Label Powerset approach which creates meta-labels joining pairs of DeCS labels with high co-occurrence scores, and an alternative method based on label profile matching. Results obtained in official runs seem to confirm the suitability of this approach for languages like Spanish.
Paper Structure (7 sections, 2 equations, 1 figure, 3 tables)

This paper contains 7 sections, 2 equations, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Example of the index term extraction methods.