Table of Contents
Fetching ...

The Branch Not Taken: Predicting Branching in Online Conversations

Shai Meital, Lior Rokach, Roman Vainshtein, Nir Grinberg

TL;DR

This paper introduces the branch prediction task for online, tree-structured discussions and presents GLOBS, a two-stage model that combines a fine-tuned DistillBERT for reply-to prediction with a pooling-and-contextual feature classifier to predict when a new comment initiates a new branch. Evaluation on three Reddit forums (CMV, ELI5, ASC) shows that GLOBS consistently outperforms strong baselines, with robust transferability across forums. Feature analysis, including SHAP, highlights the importance of pooling linguistic relations and structural/temporal context in predicting branching, while error analysis provides guidance on model limitations. The work contributes publicly released code and models, and points to practical applications in summarization, thread disentanglement, and downstream conversational modeling.

Abstract

Multi-participant discussions tend to unfold in a tree structure rather than a chain structure. Branching may occur for multiple reasons -- from the asynchronous nature of online platforms to a conscious decision by an interlocutor to disengage with part of the conversation. Predicting branching and understanding the reasons for creating new branches is important for many downstream tasks such as summarization and thread disentanglement and may help develop online spaces that encourage users to engage in online discussions in more meaningful ways. In this work, we define the novel task of branch prediction and propose GLOBS (Global Branching Score) -- a deep neural network model for predicting branching. GLOBS is evaluated on three large discussion forums from Reddit, achieving significant improvements over an array of competitive baselines and demonstrating better transferability. We affirm that structural, temporal, and linguistic features contribute to GLOBS success and find that branching is associated with a greater number of conversation participants and tends to occur in earlier levels of the conversation tree. We publicly release GLOBS and our implementation of all baseline models to allow reproducibility and promote further research on this important task.

The Branch Not Taken: Predicting Branching in Online Conversations

TL;DR

This paper introduces the branch prediction task for online, tree-structured discussions and presents GLOBS, a two-stage model that combines a fine-tuned DistillBERT for reply-to prediction with a pooling-and-contextual feature classifier to predict when a new comment initiates a new branch. Evaluation on three Reddit forums (CMV, ELI5, ASC) shows that GLOBS consistently outperforms strong baselines, with robust transferability across forums. Feature analysis, including SHAP, highlights the importance of pooling linguistic relations and structural/temporal context in predicting branching, while error analysis provides guidance on model limitations. The work contributes publicly released code and models, and points to practical applications in summarization, thread disentanglement, and downstream conversational modeling.

Abstract

Multi-participant discussions tend to unfold in a tree structure rather than a chain structure. Branching may occur for multiple reasons -- from the asynchronous nature of online platforms to a conscious decision by an interlocutor to disengage with part of the conversation. Predicting branching and understanding the reasons for creating new branches is important for many downstream tasks such as summarization and thread disentanglement and may help develop online spaces that encourage users to engage in online discussions in more meaningful ways. In this work, we define the novel task of branch prediction and propose GLOBS (Global Branching Score) -- a deep neural network model for predicting branching. GLOBS is evaluated on three large discussion forums from Reddit, achieving significant improvements over an array of competitive baselines and demonstrating better transferability. We affirm that structural, temporal, and linguistic features contribute to GLOBS success and find that branching is associated with a greater number of conversation participants and tends to occur in earlier levels of the conversation tree. We publicly release GLOBS and our implementation of all baseline models to allow reproducibility and promote further research on this important task.
Paper Structure (24 sections, 2 equations, 4 figures, 3 tables)

This paper contains 24 sections, 2 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: The branch prediction task -- determine whether a new comment (orange node) will be a reply to any of the intermediate nodes (defined as branching and validated in Section \ref{['subsec:problem_def']}; shaded in purple) or any of the leaf nodes (shaded in blue). GLOBS uses pooling features of reply-to relations as well as structural and temporal features to predict branching.
  • Figure 2: An overview of the GLOBS workflow. First, fine-tune a DistillBERT transformer model for the reply-to relation prediction task using thousands of conversations (step 0). Second, we represent a given conversation tree by its terminal (blue) and intermediate (purple) nodes and predict the reply-to relation of the $k+1$ node (orange) to all preceding nodes (step 1). We then concatenate various pooling features of the predicted reply-to relations with the conversations' context features (step 2). A fully connected network is trained on the concatenated representation to output the global branching score (step 3).
  • Figure 3: SHAP feature importance for the CMV dataset using the GLOBS model.
  • Figure 4: SHAP values (feature importance) for the ELI5 (top) and ASC (bottom) forums for the different features in GLOBS.