Table of Contents
Fetching ...

InstructNet: A Novel Approach for Multi-Label Instruction Classification through Advanced Deep Learning

Tanjim Taharat Aurpa, Md Shoaib Ahmed, Md Mahbubur Rahman, Md. Golam Moazzam

TL;DR

This work addresses the lack of methods for multilabel classification of instructional text from wikiHow. It introduces InstructNet, a transformer-based approach primarily leveraging XLNet (with BERT as a baseline) to tag HowTo articles with multiple labels. A label-filtering data-preparation pipeline reduces labels to 67 with sufficient observations, enabling robust multilabel learning evaluated via accuracy and Macro F1. XLNet achieves 97.30% accuracy (Macro F1 ~93%), outperforming several baselines and demonstrating strong potential for improving instruction search, knowledge bases, and task-oriented learning. The study also reports transfer performance on other multilabel datasets, underscoring the practicality of the approach and outlining future enhancements such as more efficient label encoding and dataset sharing.

Abstract

People use search engines for various topics and items, from daily essentials to more aspirational and specialized objects. Therefore, search engines have taken over as peoples preferred resource. The How To prefix has become familiar and widely used in various search styles to find solutions to particular problems. This search allows people to find sequential instructions by providing detailed guidelines to accomplish specific tasks. Categorizing instructional text is also essential for task-oriented learning and creating knowledge bases. This study uses the How To articles to determine the multi-label instruction category. We have brought this work with a dataset comprising 11,121 observations from wikiHow, where each record has multiple categories. To find out the multi-label category meticulously, we employ some transformer-based deep neural architectures, such as Generalized Autoregressive Pretraining for Language Understanding (XLNet), Bidirectional Encoder Representation from Transformers (BERT), etc. In our multi-label instruction classification process, we have reckoned our proposed architectures using accuracy and macro f1-score as the performance metrics. This thorough evaluation showed us much about our strategys strengths and drawbacks. Specifically, our implementation of the XLNet architecture has demonstrated unprecedented performance, achieving an accuracy of 97.30% and micro and macro average scores of 89.02% and 93%, a noteworthy accomplishment in multi-label classification. This high level of accuracy and macro average score is a testament to the effectiveness of the XLNet architecture in our proposed InstructNet approach. By employing a multi-level strategy in our evaluation process, we have gained a more comprehensive knowledge of the effectiveness of our proposed architectures and identified areas for forthcoming improvement and refinement.

InstructNet: A Novel Approach for Multi-Label Instruction Classification through Advanced Deep Learning

TL;DR

This work addresses the lack of methods for multilabel classification of instructional text from wikiHow. It introduces InstructNet, a transformer-based approach primarily leveraging XLNet (with BERT as a baseline) to tag HowTo articles with multiple labels. A label-filtering data-preparation pipeline reduces labels to 67 with sufficient observations, enabling robust multilabel learning evaluated via accuracy and Macro F1. XLNet achieves 97.30% accuracy (Macro F1 ~93%), outperforming several baselines and demonstrating strong potential for improving instruction search, knowledge bases, and task-oriented learning. The study also reports transfer performance on other multilabel datasets, underscoring the practicality of the approach and outlining future enhancements such as more efficient label encoding and dataset sharing.

Abstract

People use search engines for various topics and items, from daily essentials to more aspirational and specialized objects. Therefore, search engines have taken over as peoples preferred resource. The How To prefix has become familiar and widely used in various search styles to find solutions to particular problems. This search allows people to find sequential instructions by providing detailed guidelines to accomplish specific tasks. Categorizing instructional text is also essential for task-oriented learning and creating knowledge bases. This study uses the How To articles to determine the multi-label instruction category. We have brought this work with a dataset comprising 11,121 observations from wikiHow, where each record has multiple categories. To find out the multi-label category meticulously, we employ some transformer-based deep neural architectures, such as Generalized Autoregressive Pretraining for Language Understanding (XLNet), Bidirectional Encoder Representation from Transformers (BERT), etc. In our multi-label instruction classification process, we have reckoned our proposed architectures using accuracy and macro f1-score as the performance metrics. This thorough evaluation showed us much about our strategys strengths and drawbacks. Specifically, our implementation of the XLNet architecture has demonstrated unprecedented performance, achieving an accuracy of 97.30% and micro and macro average scores of 89.02% and 93%, a noteworthy accomplishment in multi-label classification. This high level of accuracy and macro average score is a testament to the effectiveness of the XLNet architecture in our proposed InstructNet approach. By employing a multi-level strategy in our evaluation process, we have gained a more comprehensive knowledge of the effectiveness of our proposed architectures and identified areas for forthcoming improvement and refinement.

Paper Structure

This paper contains 22 sections, 7 equations, 10 figures, 5 tables, 1 algorithm.

Figures (10)

  • Figure 1: Illustration of How Transformer-XL works with long sequences.(If we compare the figures of the transformer and transformer XL, we can observe that the context representation is limited in transformers where the transformer XL is able to represent long sequences more efficiently.)
  • Figure 2: The training phase of the BERT model. The wikiHow instructions are tokenized after combining BERT's special tokens [CLS].
  • Figure 3: Permutation Language Model for predicting token $x_3$ for a given factorization order.
  • Figure 4: The system architecture of our proposed methodology. It indicates the system's workflow and is divided into three parts- Data Preparation and Preprocessing (Workflow given inside the green box of the figure), Model Training (The yellow box of the figure) and Validation (The blue box of the figure).
  • Figure 5: Training accuracy and Testing accuracy of the XLNet model over epochs. (The curve is smoothened using the Gaussian filter. )
  • ...and 5 more figures