Table of Contents
Fetching ...

A Text Classification Framework for Simple and Effective Early Depression Detection Over Social Media Streams

Sergio G. Burdisso, Marcelo Errecalde, Manuel Montes-y-Gómez

TL;DR

SS3 introduces a white-box, hierarchical text classifier for early risk detection on social streams, explicitly addressing incremental learning, early classification, and interpretability. By computing a global word value gv(w,c) from normalized local values and combining category-significance with sanctions, SS3 aggregates evidence across words, sentences, and paragraphs in an incremental, map-reduce-friendly fashion. The framework achieves state-of-the-art time-aware performance on the eRisk2017 early depression task, notably improving ERDE metrics while remaining computationally efficient. Its domain-agnostic design and explainable rationale offer practical potential for large-scale monitoring and decision-support in healthcare and safety-critical contexts.

Abstract

With the rise of the Internet, there is a growing need to build intelligent systems that are capable of efficiently dealing with early risk detection (ERD) problems on social media, such as early depression detection, early rumor detection or identification of sexual predators. These systems, nowadays mostly based on machine learning techniques, must be able to deal with data streams since users provide their data over time. In addition, these systems must be able to decide when the processed data is sufficient to actually classify users. Moreover, since ERD tasks involve risky decisions by which people's lives could be affected, such systems must also be able to justify their decisions. However, most standard and state-of-the-art supervised machine learning models are not well suited to deal with this scenario. This is due to the fact that they either act as black boxes or do not support incremental classification/learning. In this paper we introduce SS3, a novel supervised learning model for text classification that naturally supports these aspects. SS3 was designed to be used as a general framework to deal with ERD problems. We evaluated our model on the CLEF's eRisk2017 pilot task on early depression detection. Most of the 30 contributions submitted to this competition used state-of-the-art methods. Experimental results show that our classifier was able to outperform these models and standard classifiers, despite being less computationally expensive and having the ability to explain its rationale.

A Text Classification Framework for Simple and Effective Early Depression Detection Over Social Media Streams

TL;DR

SS3 introduces a white-box, hierarchical text classifier for early risk detection on social streams, explicitly addressing incremental learning, early classification, and interpretability. By computing a global word value gv(w,c) from normalized local values and combining category-significance with sanctions, SS3 aggregates evidence across words, sentences, and paragraphs in an incremental, map-reduce-friendly fashion. The framework achieves state-of-the-art time-aware performance on the eRisk2017 early depression task, notably improving ERDE metrics while remaining computationally efficient. Its domain-agnostic design and explainable rationale offer practical potential for large-scale monitoring and decision-support in healthcare and safety-critical contexts.

Abstract

With the rise of the Internet, there is a growing need to build intelligent systems that are capable of efficiently dealing with early risk detection (ERD) problems on social media, such as early depression detection, early rumor detection or identification of sexual predators. These systems, nowadays mostly based on machine learning techniques, must be able to deal with data streams since users provide their data over time. In addition, these systems must be able to decide when the processed data is sufficient to actually classify users. Moreover, since ERD tasks involve risky decisions by which people's lives could be affected, such systems must also be able to justify their decisions. However, most standard and state-of-the-art supervised machine learning models are not well suited to deal with this scenario. This is due to the fact that they either act as black boxes or do not support incremental classification/learning. In this paper we introduce SS3, a novel supervised learning model for text classification that naturally supports these aspects. SS3 was designed to be used as a general framework to deal with ERD problems. We evaluated our model on the CLEF's eRisk2017 pilot task on early depression detection. Most of the 30 contributions submitted to this competition used state-of-the-art methods. Experimental results show that our classifier was able to outperform these models and standard classifiers, despite being less computationally expensive and having the ability to explain its rationale.

Paper Structure

This paper contains 22 sections, 10 equations, 9 figures, 4 tables, 1 algorithm.

Figures (9)

  • Figure 1: Classification process for a hypothetical example document "Apple was developed with a Web Browser that didn't support cookies. The company decided to remove it from the market". In the first stage, this document is split into two sentences (for instance, by using the dot as a delimiter) and then each sentence is also split into single words. In the second stage, global values are computed for every word to generate the first set of confidence vectors. Then all of these word vectors are reduced by the $\oplus_0$ operator to sentence vectors, $(0.1, 3.45, 0.1, 0.05)$ and $(0.05, 0.2, 1.9, 0.1)$ for the first and second sentence respectively. After that, these two sentence vectors are also reduced by another operator ($\oplus_1$, which in this case is the addition operator) to a single confidence vector for the entire document, $(0.15, 3.65, 2.0, 0.15)$. Finally, a policy is applied to this vector to make the classification ---which in this example was to select technology, the category with the highest value, and also business because its value was "close enough" to technology's.
  • Figure 2: subject 9579's positive and negative confidence value variation over time. Time is measured in writings and it could be further expanded as more writings are created by the subject over time.
  • Figure 3: word-local value diagram for 5 different values of $\sigma$: 1, 0.8, 0.5, 0.3 and 0.1. The abscissa represents individual words arranged in order of frequency. Note that when $\sigma=1$, $lv_1$ (red line) matches the shape of the raw frequency (the actual word distribution), however, as $\sigma$ decreases, the curve becomes smoother; reducing the gap between the highest and the lowest values.
  • Figure 4: global value (green) in relation to the local value (orange) for the "depressed" category. The abscissa represents individual words arranged in order of frequency. Note that the zone in which stop words are located (close to 0 in the abscissa) the local value is very high (since they are highly frequent words) but the global value is almost 0, which is the desired behavior.
  • Figure 5: Top-100 words selected by global value (GV) from the model trained for the eRisk Pilot Task using chunks. The font size is related to (a) GV and (b) row frequency. The green color indicates the words selected only by GV whereas the orange color indicates the words also selected by the traditional Information Gain(IG).
  • ...and 4 more figures