Table of Contents
Fetching ...

Machine Learning Algorithms: Detection Official Hajj and Umrah Travel Agency Based on Text and Metadata Analysis

Wisnu Uriawan, Muhamad Veva Ramadhan, Firman Adi Nugraha, Hasbi Nur Wahid, M Dantha Arianvasya, Muhammad Zaki Alghifari

TL;DR

<3-5 sentence high-level summary> This work tackles counterfeit Hajj/Umrah travel apps in Indonesia by developing a machine learning-based automated authenticity detector using a hybrid of textual (TF-IDF) and metadata (permissions) features. It compares Naïve Bayes, Random Forest, and Support Vector Machine, finding SVM with an RBF kernel most effective (92.3% accuracy, 91.5% precision, 92.0% F1) on a balanced dataset of 100 official and 100 unofficial apps, augmented with synonym-based text expansion. The study demonstrates that combining linguistic cues with permission patterns yields superior detection and offers a scalable solution for digital trust in religious tourism, with explicit discussion of security, policy, and future blockchain-integrated architectures. The findings support deploying a proactive verification layer within government digital ecosystems and pave the way for hybrid AI-chain verification systems to combat fraud in app stores.

Abstract

The rapid digitalization of Hajj and Umrah services in Indonesia has significantly facilitated pilgrims but has concurrently opened avenues for digital fraud through counterfeit mobile applications. These fraudulent applications not only inflict financial losses but also pose severe privacy risks by harvesting sensitive personal data. This research aims to address this critical issue by implementing and evaluating machine learning algorithms to verify application authenticity automatically. Using a comprehensive dataset comprising both official applications registered with the Ministry of Religious Affairs and unofficial applications circulating on app stores, we compare the performance of three robust classifiers: Support Vector Machine (SVM), Random Forest (RF), and Na"ive Bayes (NB). The study utilizes a hybrid feature extraction methodology that combines Textual Analysis (TF-IDF) of application descriptions with Metadata Analysis of sensitive access permissions. The experimental results indicate that the SVM algorithm achieves the highest performance with an accuracy of 92.3%, a precision of 91.5%, and an F1-score of 92.0%. Detailed feature analysis reveals that specific keywords related to legality and high-risk permissions (e.g., READ PHONE STATE) are the most significant discriminators. This system is proposed as a proactive, scalable solution to enhance digital trust in the religious tourism sector, potentially serving as a prototype for a national verification system.

Machine Learning Algorithms: Detection Official Hajj and Umrah Travel Agency Based on Text and Metadata Analysis

TL;DR

<3-5 sentence high-level summary> This work tackles counterfeit Hajj/Umrah travel apps in Indonesia by developing a machine learning-based automated authenticity detector using a hybrid of textual (TF-IDF) and metadata (permissions) features. It compares Naïve Bayes, Random Forest, and Support Vector Machine, finding SVM with an RBF kernel most effective (92.3% accuracy, 91.5% precision, 92.0% F1) on a balanced dataset of 100 official and 100 unofficial apps, augmented with synonym-based text expansion. The study demonstrates that combining linguistic cues with permission patterns yields superior detection and offers a scalable solution for digital trust in religious tourism, with explicit discussion of security, policy, and future blockchain-integrated architectures. The findings support deploying a proactive verification layer within government digital ecosystems and pave the way for hybrid AI-chain verification systems to combat fraud in app stores.

Abstract

The rapid digitalization of Hajj and Umrah services in Indonesia has significantly facilitated pilgrims but has concurrently opened avenues for digital fraud through counterfeit mobile applications. These fraudulent applications not only inflict financial losses but also pose severe privacy risks by harvesting sensitive personal data. This research aims to address this critical issue by implementing and evaluating machine learning algorithms to verify application authenticity automatically. Using a comprehensive dataset comprising both official applications registered with the Ministry of Religious Affairs and unofficial applications circulating on app stores, we compare the performance of three robust classifiers: Support Vector Machine (SVM), Random Forest (RF), and Na"ive Bayes (NB). The study utilizes a hybrid feature extraction methodology that combines Textual Analysis (TF-IDF) of application descriptions with Metadata Analysis of sensitive access permissions. The experimental results indicate that the SVM algorithm achieves the highest performance with an accuracy of 92.3%, a precision of 91.5%, and an F1-score of 92.0%. Detailed feature analysis reveals that specific keywords related to legality and high-risk permissions (e.g., READ PHONE STATE) are the most significant discriminators. This system is proposed as a proactive, scalable solution to enhance digital trust in the religious tourism sector, potentially serving as a prototype for a national verification system.

Paper Structure

This paper contains 24 sections, 5 equations, 6 figures, 6 tables, 2 algorithms.

Figures (6)

  • Figure 1: Visual mapping of keyword dominance: Official apps use formal terminology, while Unofficial apps focus on marketing vernacular.
  • Figure 2: Distribution of high-risk permissions. Unofficial apps (85%) significantly over-request sensitive access compared to Official apps (15%).
  • Figure 3: Research Methodology Flowchart
  • Figure 4: Visualization of Feature Importance
  • Figure 5: Conceptual Diagram of Hybrid Architecture (AI + Blockchain)
  • ...and 1 more figures