Profile Generators: A Link between the Narrative and the Binary Matrix Representation
Raoul H. Kutil, Georg Zimmermann, Barbara Strasser-Kirchweger, Christian Borgelt
TL;DR
This work addresses the challenge of turning narrative DSM-5 diagnostic criteria into a scalable, machine-actionable format. It introduces symptom profile generators to auto-generate All Profiles for binary matrices, enabling efficient computation of disorder similarity via Maximum Pairwise Cosine Similarity ($MPCS_{\max}$) even for disorders with billions of symptom profiles, such as Major Depressive Disorder (e.g., $1{,}376{,}583{,}579$ profiles) and Panic Disorder (over $3{,}119{,}485{,}608$ profiles). The paper further shows that conditional generators can reduce the number of cosine similarity computations from astronomical scales to a handful of evaluations, dramatically improving practicality. Overall, the generator framework delivers a readable, adaptable, and scalable bridge between narrative diagnostic knowledge and binary-matrix analysis, with potential applicability beyond psychiatry.
Abstract
Mental health disorders, particularly cognitive disorders defined by deficits in cognitive abilities, are described in detail in the DSM-5, which includes definitions and examples of signs and symptoms. A simplified, machine-actionable representation was developed to assess the similarity and separability of these disorders, but it is not suited for the most complex cases. Generating or applying a full binary matrix for similarity calculations is infeasible due to the vast number of symptom combinations. This research develops an alternative representation that links the narrative form of the DSM-5 with the binary matrix representation and enables automated generation of valid symptom combinations. Using a strict pre-defined format of lists, sets, and numbers with slight variations, complex diagnostic pathways involving numerous symptom combinations can be represented. This format, called the symptom profile generator (or simply generator), provides a readable, adaptable, and comprehensive alternative to a binary matrix while enabling easy generation of symptom combinations (profiles). Cognitive disorders, which typically involve multiple diagnostic criteria with several symptoms, can thus be expressed as lists of generators. Representing several psychotic disorders in generator form and generating all symptom combinations showed that matrix representations of complex disorders become too large to manage. The MPCS (maximum pairwise cosine similarity) algorithm cannot handle matrices of this size, prompting the development of a profile reduction method using targeted generator manipulation to find specific MPCS values between disorders. The generators allow easier creation of binary representations for large matrices and make it possible to calculate specific MPCS cases between complex disorders through conditional generators.
