Can we trust AI to detect healthy multilingual English speakers among the cognitively impaired cohort in the UK? An investigation using real-world conversational speech
Madhurananda Pahar, Caitlin Illingworth, Dorota Braun, Bahman Mirheidari, Lise Sproson, Daniel Blackburn, Heidi Christensen
TL;DR
This study interrogates the trustworthiness of AI-based cognitive decline detection in the UK’s multilingual, ethnic minority populations. By evaluating ASR performance, three-/two-class classification, and MMSE prediction on CognoMemory data (across monolingual and multilingual groups) and comparing with DementiaBank, it finds minimal ASR bias but clear biases in linguistic-feature-based models, particularly disadvantaging multilingual speakers and certain accents. The results underscore the need for bias-mitigated, generalisable models and culturally informed screening tools before deploying such AI in clinical settings. The work contributes a large, ethnically diverse real-world dataset and demonstrates the complexities of translating high-performing models from majority populations to diverse UK communities, with implications for fairer, more accessible dementia screening.
Abstract
Conversational speech often reveals early signs of cognitive decline, such as dementia and MCI. In the UK, one in four people belongs to an ethnic minority, and dementia prevalence is expected to rise most rapidly among Black and Asian communities. This study examines the trustworthiness of AI models, specifically the presence of bias, in detecting healthy multilingual English speakers among the cognitively impaired cohort, to make these tools clinically beneficial. For experiments, monolingual participants were recruited nationally (UK), and multilingual speakers were enrolled from four community centres in Sheffield and Bradford. In addition to a non-native English accent, multilinguals spoke Somali, Chinese, or South Asian languages, who were further divided into two Yorkshire accents (West and South) to challenge the efficiency of the AI tools thoroughly. Although ASR systems showed no significant bias across groups, classification and regression models using acoustic and linguistic features exhibited bias against multilingual speakers, particularly in memory, fluency, and reading tasks. This bias was more pronounced when models were trained on the publicly available DementiaBank dataset. Moreover, multilinguals were more likely to be misclassified as having cognitive decline. This study is the first of its kind to discover that, despite their strong overall performance, current AI models show bias against multilingual individuals from ethnic minority backgrounds in the UK, and they are also more likely to misclassify speakers with a certain accent (South Yorkshire) as living with a more severe cognitive decline. In this pilot study, we conclude that the existing AI tools are therefore not yet reliable for diagnostic use in these populations, and we aim to address this in future work by developing more generalisable, bias-mitigated models.
