Table of Contents
Fetching ...

A centroid based framework for text classification in itsm environments

Hossein Mohanna, Ali Ait-Bachir

TL;DR

The paper addresses hierarchical text classification in ITSM where taxonomies are dynamic and labels may be non-semantic. It introduces a dual-embedding centroid framework that maintains separate semantic and lexical centroids per category and fuses their signals at inference via Reciprocal Rank Fusion. On a real ITSM dataset with 8{,}968 tickets across 123 categories, the approach achieves competitive hierarchical F1 and Top-3 accuracy relative to SVM while delivering interpretability and substantial training and incremental-update speedups. The work demonstrates that interpretable centroid-based methods with dual representations are practical alternatives for HTC in production ITSM environments, especially where taxonomy evolution and operational efficiency are critical.

Abstract

Text classification with hierarchical taxonomies is a fundamental requirement in IT Service Management (ITSM) systems, where support tickets must be categorized into tree-structured taxonomies. We present a dual-embedding centroid-based classification framework that maintains separate semantic and lexical centroid representations per category, combining them through reciprocal rank fusion at inference time. The framework achieves performance competitive with Support Vector Machines (hierarchical F1: 0.731 vs 0.727) while providing interpretability through centroid representations. Evaluated on 8,968 ITSM tickets across 123 categories, this method achieves 5.9 times faster training and up to 152 times faster incremental updates. With 8.6-8.8 times speedup across batch sizes (100-1000 samples) when excluding embedding computation. These results make the method suitable for production ITSM environments prioritizing interpretability and operational efficiency.

A centroid based framework for text classification in itsm environments

TL;DR

The paper addresses hierarchical text classification in ITSM where taxonomies are dynamic and labels may be non-semantic. It introduces a dual-embedding centroid framework that maintains separate semantic and lexical centroids per category and fuses their signals at inference via Reciprocal Rank Fusion. On a real ITSM dataset with 8{,}968 tickets across 123 categories, the approach achieves competitive hierarchical F1 and Top-3 accuracy relative to SVM while delivering interpretability and substantial training and incremental-update speedups. The work demonstrates that interpretable centroid-based methods with dual representations are practical alternatives for HTC in production ITSM environments, especially where taxonomy evolution and operational efficiency are critical.

Abstract

Text classification with hierarchical taxonomies is a fundamental requirement in IT Service Management (ITSM) systems, where support tickets must be categorized into tree-structured taxonomies. We present a dual-embedding centroid-based classification framework that maintains separate semantic and lexical centroid representations per category, combining them through reciprocal rank fusion at inference time. The framework achieves performance competitive with Support Vector Machines (hierarchical F1: 0.731 vs 0.727) while providing interpretability through centroid representations. Evaluated on 8,968 ITSM tickets across 123 categories, this method achieves 5.9 times faster training and up to 152 times faster incremental updates. With 8.6-8.8 times speedup across batch sizes (100-1000 samples) when excluding embedding computation. These results make the method suitable for production ITSM environments prioritizing interpretability and operational efficiency.

Paper Structure

This paper contains 14 sections, 2 equations, 1 figure, 5 tables, 2 algorithms.

Figures (1)

  • Figure 1: Inference phase with parallel dual-ranking and RRF fusion.