Lightweight Connective Detection Using Gradient Boosting

Mustafa Erolcan Er; Murathan Kurfalı; Deniz Zeyrek

Lightweight Connective Detection Using Gradient Boosting

Mustafa Erolcan Er, Murathan Kurfalı, Deniz Zeyrek

TL;DR

The paper tackles the computational inefficiency of neural discourse connective detection by proposing a lightweight gradient-boosting approach trained on simple linguistic features. It models connective detection as a three-way token classification and uses XGBoost with verb-based, word-based, and position-based features, achieving $O(n)$ feature-time complexity and fast CPU inference. It demonstrates competitive performance on English PDTB 2.0 and Turkish TDB 1.0 against strong baselines, with cross-language robustness and insights from feature importance and error analysis. The work enables scalable, resource-efficient discourse annotation and provides a practical alternative for low-resource settings while offering directions for improving multilingual discourse parsers.

Abstract

In this work, we introduce a lightweight discourse connective detection system. Employing gradient boosting trained on straightforward, low-complexity features, this proposed approach sidesteps the computational demands of the current approaches that rely on deep neural networks. Considering its simplicity, our approach achieves competitive results while offering significant gains in terms of time even on CPU. Furthermore, the stable performance across two unrelated languages suggests the robustness of our system in the multilingual scenario. The model is designed to support the annotation of discourse relations, particularly in scenarios with limited resources, while minimizing performance loss.

Lightweight Connective Detection Using Gradient Boosting

TL;DR

feature-time complexity and fast CPU inference. It demonstrates competitive performance on English PDTB 2.0 and Turkish TDB 1.0 against strong baselines, with cross-language robustness and insights from feature importance and error analysis. The work enables scalable, resource-efficient discourse annotation and provides a practical alternative for low-resource settings while offering directions for improving multilingual discourse parsers.

Abstract

Paper Structure (12 sections, 1 equation, 2 figures, 5 tables)

This paper contains 12 sections, 1 equation, 2 figures, 5 tables.

Introduction
Related Work
Approach
Experimental setting
Data
Baseline Models
Results and Discussion
Results
Feature Importance
Error Analysis
Conclusion and Further Studies
Bibliographical References

Figures (2)

Figure 1: Feature importance in PDTB 2.0 for our best model
Figure 2: Feature importance in TDB 1.0 for our best model

Lightweight Connective Detection Using Gradient Boosting

TL;DR

Abstract

Lightweight Connective Detection Using Gradient Boosting

Authors

TL;DR

Abstract

Table of Contents

Figures (2)