Cheap Ways of Extracting Clinical Markers from Texts
Anastasia Sandu, Teodor Mihailescu, Sergiu Nisioi
TL;DR
The paper investigates extracting clinical markers of suicide risk from Reddit posts by comparing a memory-efficient GOML pipeline (tf-idf with logistic regression) for highlights against an LLM-driven abstractive summarization approach, including CPU-friendly quantized LLMs. Using CLPsych 2024 data (Task A and expert subsets), it demonstrates that while GOML excels at high-recall highlights, LLM-based summaries improve readability and coherence, with the best results achieved by a hybrid approach that pairs GOML-derived highlights (via 3.2) with LLM-generated summaries (via 4.2). Duplicates in LLM outputs can inflate recall, underscoring the need for multi-criteria evaluation (recall, consistency, contradiction). The work highlights that combining traditional, explainable ML with modern LLMs can be effective in resource-constrained settings and in languages with limited data, though ethical considerations and potential biases in generated content remain important. Overall, the study suggests a practical, CPU-friendly pathway for extracting evidentiary highlights and clinical markers from text, balancing speed, cost, and reliability, with human judgment guiding final assessments.
Abstract
This paper describes the work of the UniBuc Archaeology team for CLPsych's 2024 Shared Task, which involved finding evidence within the text supporting the assigned suicide risk level. Two types of evidence were required: highlights (extracting relevant spans within the text) and summaries (aggregating evidence into a synthesis). Our work focuses on evaluating Large Language Models (LLM) as opposed to an alternative method that is much more memory and resource efficient. The first approach employs a good old-fashioned machine learning (GOML) pipeline consisting of a tf-idf vectorizer with a logistic regression classifier, whose representative features are used to extract relevant highlights. The second, more resource intensive, uses an LLM for generating the summaries and is guided by chain-of-thought to provide sequences of text indicating clinical markers.
