Table of Contents
Fetching ...

Towards identifying Source credibility on Information Leakage in Digital Gadget Market

Neha Kumaru, Garvit Gupta, Shreyas Mongia, Shubham Singh, Ponnurangam Kumaraguru, Arun Balaji Buduru

TL;DR

This work analyzes the headlines of leak web-blog posts and their corresponding official press-release and proposes a credibility score metric for the web-blog, based on the number of falsified and authentic smartphone leak posts.

Abstract

The use of Social media to share content is on a constant rise. One of the capsize effect of information sharing on Social media includes the spread of sensitive information on the public domain. With the digital gadget market becoming highly competitive and ever-evolving, the trend of an increasing number of sensitive posts leaking information on devices in social media is observed. Many web-blogs on digital gadget market have mushroomed recently, making the problem of information leak all pervasive. Credible leaks on specifics of an upcoming device can cause a lot of financial damage to the respective organization. Hence, it is crucial to assess the credibility of the platforms that continuously post about a smartphone or digital gadget leaks. In this work, we analyze the headlines of leak web-blog posts and their corresponding official press-release. We first collect 54, 495 leak and press-release headlines for different smartphones. We train our custom NER model to capture the evolving smartphone names with an accuracy of 82.14% on manually annotated results. We further propose a credibility score metric for the web-blog, based on the number of falsified and authentic smartphone leak posts.

Towards identifying Source credibility on Information Leakage in Digital Gadget Market

TL;DR

This work analyzes the headlines of leak web-blog posts and their corresponding official press-release and proposes a credibility score metric for the web-blog, based on the number of falsified and authentic smartphone leak posts.

Abstract

The use of Social media to share content is on a constant rise. One of the capsize effect of information sharing on Social media includes the spread of sensitive information on the public domain. With the digital gadget market becoming highly competitive and ever-evolving, the trend of an increasing number of sensitive posts leaking information on devices in social media is observed. Many web-blogs on digital gadget market have mushroomed recently, making the problem of information leak all pervasive. Credible leaks on specifics of an upcoming device can cause a lot of financial damage to the respective organization. Hence, it is crucial to assess the credibility of the platforms that continuously post about a smartphone or digital gadget leaks. In this work, we analyze the headlines of leak web-blog posts and their corresponding official press-release. We first collect 54, 495 leak and press-release headlines for different smartphones. We train our custom NER model to capture the evolving smartphone names with an accuracy of 82.14% on manually annotated results. We further propose a credibility score metric for the web-blog, based on the number of falsified and authentic smartphone leak posts.
Paper Structure (18 sections, 3 equations, 8 figures, 6 tables)

This paper contains 18 sections, 3 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Bar chart of number of alleged smartphone leak by different web-blogs.
  • Figure 2: Comparison of the average compound sentiment score of leak headlines versus press-releases headlines
  • Figure 3: Distribution of verb form usage in press-releases and leaked blog headlines. Verb abbreviations: past participle as VBN, 3rd person singular present as VBZ, gerund or present participle as VBG, non-3rd person singular present as VBP, base form as VB, modal auxiliary as MD, past tense as VBD
  • Figure 4: Distribution of the average length of the Press-release versus web-blog leak headlines. The mean length of both the headlines are approximately $12\%$.
  • Figure 5: Architecture diagram of the proposed model. The first part consists of custom NER model. The second part groups product based on NER. In the third part we match the datetime of first appearance of leak and Press-release to assign the credibility score (P.R refers to Press-releases).
  • ...and 3 more figures