TMU at TREC Clinical Trials Track 2023
Aritra Kumar Lahiri, Emrul Hasan, Qinmin Vivian Hu, Cherie Ding
TL;DR
TMU investigates clinical-trial retrieval using patient-health summaries for the TREC Clinical Trials Track 2023. The approach combines PubMed Parser-based XML extraction of brief_summary, detailed_description, id_info, and eligibility with two embedding strategies— Sentence Transformer RoBERTa-large and Doc2Vec (DM/DBOW)— and uses cosine similarity to rank up to 1000 trials per topic, defined as $CosineSimilarity(t, d) = \frac{t \cdot d}{\|t\| \cdot \|d\|}$. Four runs (v1/v4 Doc2Vec; v2/v3 RoBERTa-large) show that Sentence Transformer outperforms Doc2Vec on NDCG-based metrics across topics. The results demonstrate the feasibility and practicality of transformer-based semantic representations for mismatch-free inclusion/exclusion-filtered clinical-trial retrieval, enabling improved patient-to-trial matching.
Abstract
This paper describes Toronto Metropolitan University's participation in the TREC Clinical Trials Track for 2023. As part of the tasks, we utilize advanced natural language processing techniques and neural language models in our experiments to retrieve the most relevant clinical trials. We illustrate the overall methodology, experimental settings, and results of our implementation for the run submission as part of Team - V-TorontoMU.
