Lightweight Connective Detection Using Gradient Boosting
Mustafa Erolcan Er, Murathan Kurfalı, Deniz Zeyrek
TL;DR
The paper tackles the computational inefficiency of neural discourse connective detection by proposing a lightweight gradient-boosting approach trained on simple linguistic features. It models connective detection as a three-way token classification and uses XGBoost with verb-based, word-based, and position-based features, achieving $O(n)$ feature-time complexity and fast CPU inference. It demonstrates competitive performance on English PDTB 2.0 and Turkish TDB 1.0 against strong baselines, with cross-language robustness and insights from feature importance and error analysis. The work enables scalable, resource-efficient discourse annotation and provides a practical alternative for low-resource settings while offering directions for improving multilingual discourse parsers.
Abstract
In this work, we introduce a lightweight discourse connective detection system. Employing gradient boosting trained on straightforward, low-complexity features, this proposed approach sidesteps the computational demands of the current approaches that rely on deep neural networks. Considering its simplicity, our approach achieves competitive results while offering significant gains in terms of time even on CPU. Furthermore, the stable performance across two unrelated languages suggests the robustness of our system in the multilingual scenario. The model is designed to support the annotation of discourse relations, particularly in scenarios with limited resources, while minimizing performance loss.
