Table of Contents
Fetching ...

Tweezers: A Framework for Security Event Detection via Event Attribution-centric Tweet Embedding

Jian Cui, Hanna Kim, Eugene Jang, Dayeon Yim, Kicheol Kim, Yongjae Lee, Jin-Woo Chung, Seungwon Shin, Xiaojing Liao

TL;DR

This work tackles the difficulty of detecting security events on Twitter by addressing noise and limited coverage in keyword-based methods. It introduces an event attribution-centric tweet embedding that leverages STIX-derived entities, a Tweet Relation Graph, and Graph Attention Networks to distinguish between events with overlapping terminology. The resulting Tweezers framework achieves higher precision and broader event coverage than baselines, with demonstrated practicality in trend analysis and identifying informative security users, and it supports integration with STIX, IDS/CTI ecosystems, and cross-platform adaptability. The approach offers timely, actionable CTI and provides resources (code and datasets) to enable further research in security event detection from social media.

Abstract

Twitter is recognized as a crucial platform for the dissemination and gathering of Cyber Threat Intelligence (CTI). Its capability to provide real-time, actionable intelligence makes it an indispensable tool for detecting security events, helping security professionals cope with ever-growing threats. However, the large volume of tweets and inherent noises of human-crafted tweets pose significant challenges in accurately identifying security events. While many studies tried to filter out event-related tweets based on keywords, they are not effective due to their limitation in understanding the semantics of tweets. Another challenge in security event detection from Twitter is the comprehensive coverage of security events. Previous studies emphasized the importance of early detection of security events, but they overlooked the importance of event coverage. To cope with these challenges, in our study, we introduce a novel event attribution-centric tweet embedding method to enable the high precision and coverage of events. Our experiment result shows that the proposed method outperforms existing text and graph-based tweet embedding methods in identifying security events. Leveraging this novel embedding approach, we have developed and implemented a framework, Tweezers, that is applicable to security event detection from Twitter for CTI gathering. This framework has demonstrated its effectiveness, detecting twice as many events compared to established baselines. Additionally, we have showcased two applications, built on Tweezers for the integration and inspection of security events, i.e., security event trend analysis and informative security user identification.

Tweezers: A Framework for Security Event Detection via Event Attribution-centric Tweet Embedding

TL;DR

This work tackles the difficulty of detecting security events on Twitter by addressing noise and limited coverage in keyword-based methods. It introduces an event attribution-centric tweet embedding that leverages STIX-derived entities, a Tweet Relation Graph, and Graph Attention Networks to distinguish between events with overlapping terminology. The resulting Tweezers framework achieves higher precision and broader event coverage than baselines, with demonstrated practicality in trend analysis and identifying informative security users, and it supports integration with STIX, IDS/CTI ecosystems, and cross-platform adaptability. The approach offers timely, actionable CTI and provides resources (code and datasets) to enable further research in security event detection from social media.

Abstract

Twitter is recognized as a crucial platform for the dissemination and gathering of Cyber Threat Intelligence (CTI). Its capability to provide real-time, actionable intelligence makes it an indispensable tool for detecting security events, helping security professionals cope with ever-growing threats. However, the large volume of tweets and inherent noises of human-crafted tweets pose significant challenges in accurately identifying security events. While many studies tried to filter out event-related tweets based on keywords, they are not effective due to their limitation in understanding the semantics of tweets. Another challenge in security event detection from Twitter is the comprehensive coverage of security events. Previous studies emphasized the importance of early detection of security events, but they overlooked the importance of event coverage. To cope with these challenges, in our study, we introduce a novel event attribution-centric tweet embedding method to enable the high precision and coverage of events. Our experiment result shows that the proposed method outperforms existing text and graph-based tweet embedding methods in identifying security events. Leveraging this novel embedding approach, we have developed and implemented a framework, Tweezers, that is applicable to security event detection from Twitter for CTI gathering. This framework has demonstrated its effectiveness, detecting twice as many events compared to established baselines. Additionally, we have showcased two applications, built on Tweezers for the integration and inspection of security events, i.e., security event trend analysis and informative security user identification.
Paper Structure (33 sections, 8 equations, 9 figures, 9 tables)

This paper contains 33 sections, 8 equations, 9 figures, 9 tables.

Figures (9)

  • Figure 1: Tweets embedded with Word2Vec. The distance between embeddings of tweets belonging to the same event is larger than those belonging to different events.
  • Figure 2: In the tweet relation graph, the one-hop neighbors of tweets $T_{e_1}$, $T'_{e_1}$, and $T_{e_2}$—referenced in Figure \ref{['fig:distance_embedding']}—are illustrated. Purple and pink dots represent tweets associated with the corresponding events.
  • Figure 3: Overview of security event attribution-centric tweet embedding method.
  • Figure 4: Explanation of our objective function for tweet embedding method.
  • Figure 5: Overall workflow of Tweezers.
  • ...and 4 more figures