Table of Contents
Fetching ...

Unsupervised Thematic Clustering Of hadith Texts Using The Apriori Algorithm

Wisnu Uriawan, Achmad Ajie Priyajie, Angga Gustian, Fikri Nur Hidayat, Sendi Ahmad Rafiudin, Muhamad Fikri Zaelani

TL;DR

The paper tackles automatic thematic clustering of hadith texts using an unsupervised Apriori-based word association approach applied to Indonesian translations of Bukhari. It preprocesses the text, converts it into binary one-hot transactions, and mines frequent itemsets with a minimum support of 0.02 to derive association rules evaluated by confidence and lift. The results uncover thematic groupings around worship, revelation, and transmission, with strong rules such as rakaat→shalat and ayat↔turun, which are validated through domain knowledge. This demonstrates a data-driven pathway to digital Islamic studies, enabling automatic thematic annotations, knowledge representations, and semantic search with potential SDG4 education benefits.

Abstract

This research stems from the urgency to automate the thematic grouping of hadith in line with the growing digitalization of Islamic texts. Based on a literature review, the unsupervised learning approach with the Apriori algorithm has proven effective in identifying association patterns and semantic relations in unlabeled text data. The dataset used is the Indonesian Translation of the hadith of Bukhari, which first goes through preprocessing stages including case folding, punctuation cleaning, tokenization, stopword removal, and stemming. Next, an association rule mining analysis was conducted using the Apriori algorithm with support, confidence, and lift parameters. The results show the existence of meaningful association patterns such as the relationship between rakaat-prayer, verse-revelation, and hadith-story, which describe the themes of worship, revelation, and hadith narration. These findings demonstrate that the Apriori algorithm has the ability to automatically uncover latent semantic relationships, while contributing to the development of digital Islamic studies and technology-based learning systems.

Unsupervised Thematic Clustering Of hadith Texts Using The Apriori Algorithm

TL;DR

The paper tackles automatic thematic clustering of hadith texts using an unsupervised Apriori-based word association approach applied to Indonesian translations of Bukhari. It preprocesses the text, converts it into binary one-hot transactions, and mines frequent itemsets with a minimum support of 0.02 to derive association rules evaluated by confidence and lift. The results uncover thematic groupings around worship, revelation, and transmission, with strong rules such as rakaat→shalat and ayat↔turun, which are validated through domain knowledge. This demonstrates a data-driven pathway to digital Islamic studies, enabling automatic thematic annotations, knowledge representations, and semantic search with potential SDG4 education benefits.

Abstract

This research stems from the urgency to automate the thematic grouping of hadith in line with the growing digitalization of Islamic texts. Based on a literature review, the unsupervised learning approach with the Apriori algorithm has proven effective in identifying association patterns and semantic relations in unlabeled text data. The dataset used is the Indonesian Translation of the hadith of Bukhari, which first goes through preprocessing stages including case folding, punctuation cleaning, tokenization, stopword removal, and stemming. Next, an association rule mining analysis was conducted using the Apriori algorithm with support, confidence, and lift parameters. The results show the existence of meaningful association patterns such as the relationship between rakaat-prayer, verse-revelation, and hadith-story, which describe the themes of worship, revelation, and hadith narration. These findings demonstrate that the Apriori algorithm has the ability to automatically uncover latent semantic relationships, while contributing to the development of digital Islamic studies and technology-based learning systems.

Paper Structure

This paper contains 17 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Research Methodology
  • Figure 2: Pre-Prossesing