Table of Contents
Fetching ...

Lightweight Conceptual Dictionary Learning for Text Classification Using Information Compression

Li Wan, Tansu Alpcan, Margreta Kuijper, Emanuele Viterbo

TL;DR

The paper tackles text classification with a lightweight, interpretable approach by coupling LZW-driven dictionary learning with discriminative refinement guided by label information. It introduces a minimum-distortion-longest-match sparse coding (MDLM) step and a discriminative-power-maximization (DPM) criterion, framed within an information-bottleneck analysis and evaluated using a novel IPAR metric. Across six benchmark datasets, the method achieves competitive accuracy on limited-vocabulary tasks using only a fraction of the parameters of deep models, while highlighting limitations on diverse vocabularies due to the repetitiveness assumptions of LZW. This work advances practical, interpretable text representations suitable for resource-constrained settings and provides a principled information-theoretic lens for understanding representation compression and task relevance.

Abstract

We propose a novel, lightweight supervised dictionary learning framework for text classification based on data compression and representation. This two-phase algorithm initially employs the Lempel-Ziv-Welch (LZW) algorithm to construct a dictionary from text datasets, focusing on the conceptual significance of dictionary elements. Subsequently, dictionaries are refined considering label data, optimizing dictionary atoms to enhance discriminative power based on mutual information and class distribution. This process generates discriminative numerical representations, facilitating the training of simple classifiers such as SVMs and neural networks. We evaluate our algorithm's information-theoretic performance using information bottleneck principles and introduce the information plane area rank (IPAR) as a novel metric to quantify the information-theoretic performance. Tested on six benchmark text datasets, our algorithm competes closely with top models, especially in limited-vocabulary contexts, using significantly fewer parameters. \review{Our algorithm closely matches top-performing models, deviating by only ~2\% on limited-vocabulary datasets, using just 10\% of their parameters. However, it falls short on diverse-vocabulary datasets, likely due to the LZW algorithm's constraints with low-repetition data. This contrast highlights its efficiency and limitations across different dataset types.

Lightweight Conceptual Dictionary Learning for Text Classification Using Information Compression

TL;DR

The paper tackles text classification with a lightweight, interpretable approach by coupling LZW-driven dictionary learning with discriminative refinement guided by label information. It introduces a minimum-distortion-longest-match sparse coding (MDLM) step and a discriminative-power-maximization (DPM) criterion, framed within an information-bottleneck analysis and evaluated using a novel IPAR metric. Across six benchmark datasets, the method achieves competitive accuracy on limited-vocabulary tasks using only a fraction of the parameters of deep models, while highlighting limitations on diverse vocabularies due to the repetitiveness assumptions of LZW. This work advances practical, interpretable text representations suitable for resource-constrained settings and provides a principled information-theoretic lens for understanding representation compression and task relevance.

Abstract

We propose a novel, lightweight supervised dictionary learning framework for text classification based on data compression and representation. This two-phase algorithm initially employs the Lempel-Ziv-Welch (LZW) algorithm to construct a dictionary from text datasets, focusing on the conceptual significance of dictionary elements. Subsequently, dictionaries are refined considering label data, optimizing dictionary atoms to enhance discriminative power based on mutual information and class distribution. This process generates discriminative numerical representations, facilitating the training of simple classifiers such as SVMs and neural networks. We evaluate our algorithm's information-theoretic performance using information bottleneck principles and introduce the information plane area rank (IPAR) as a novel metric to quantify the information-theoretic performance. Tested on six benchmark text datasets, our algorithm competes closely with top models, especially in limited-vocabulary contexts, using significantly fewer parameters. \review{Our algorithm closely matches top-performing models, deviating by only ~2\% on limited-vocabulary datasets, using just 10\% of their parameters. However, it falls short on diverse-vocabulary datasets, likely due to the LZW algorithm's constraints with low-repetition data. This contrast highlights its efficiency and limitations across different dataset types.
Paper Structure (25 sections, 12 equations, 4 figures, 4 tables, 3 algorithms)

This paper contains 25 sections, 12 equations, 4 figures, 4 tables, 3 algorithms.

Figures (4)

  • Figure 1: A flow chart of our algorithm. Given a text dataset, the LZW algorithm is implemented to generate a dictionary either at word-level or character-level. Then, the dictionary is updated by selecting a subset of atoms with high discriminative to better accomplish classification tasks. Labels are taken into consideration during the selection. With the updated dictionary, the text dataset is vectorized into a sparse vector representation. Finally, simple classifiers such as SVMs or neural networks can be trained for classification tasks.
  • Figure 2: Information plane with feasible and infeasible regions of machine learning algorithms and Information Bottleneck (IB) boundary. The dashed line denotes a potential path of a dictionary learning algorithm. Arrows indicate the potential shift between the optimal information bottleneck boundary and the dictionary learning boundary.
  • Figure 3: Information plane with an optimal IB-boundary and an example IB trajectory. The information trajectory is located below the optimal boundary, which divides the feasible region into two (region I and region II) if two vertical lines are added at the start ($I(X;T) = a$) and the end ($I(X;T) = b$) of the trajectory. The ratio between the area of region I and region II is defined as Information Plane Area Ratio (IPAR).
  • Figure 4: Information-theoretic analysis of the performance of our algorithm on six datasets. The black curve is the optimal information bottleneck boundary. The blue curve is the information trajectory when the number of atoms in the updated dictionary increases from 2 to a large enough value (no change is observed if the number is increased after a threshold). The dictionary is generated from the character-version implementation. The IPAR score for each dataset is 0.367, 0.382, 0.524, 0.538, 2.577, 2.043, as shown in the figures.