Research Experiment on Multi-Model Comparison for Chinese Text Classification Tasks
JiaCheng Li
TL;DR
This study benchmarks three approaches for Chinese text classification—TextCNN, TextRNN, and FastText—on the THUCNews dataset to guide model choice in real-world settings. It contrasts traditional features, sequence modeling, and fast embedding-based methods, evaluating using Cross-Entropy Loss, Accuracy, Precision, Recall, and F1-Score. FastText consistently achieves the best overall performance (accuracy around $92.02\%$) while TextCNN and TextRNN offer competitive results with domain-specific strengths. The results underscore the practicality of lightweight, n-gram–enhanced bag-of-words models for large-scale Chinese text classification, delivering strong accuracy with high efficiency and interpretability.
Abstract
With the explosive growth of Chinese text data and advancements in natural language processing technologies, Chinese text classification has become one of the key techniques in fields such as information retrieval and sentiment analysis, attracting increasing attention. This paper conducts a comparative study on three deep learning models:TextCNN, TextRNN, and FastText.specifically for Chinese text classification tasks. By conducting experiments on the THUCNews dataset, the performance of these models is evaluated, and their applicability in different scenarios is discussed.
