Table of Contents
Fetching ...

Cybersecurity Threat Hunting and Vulnerability Analysis Using a Neo4j Graph Database of Open Source Intelligence

Elijah Pelofske, Lorie M. Liebrock, Vincent Urias

TL;DR

This research presents a system which constructs a Neo4j graph database formed by shared connections between open source intelligence text including blogs, cybersecurity bulletins, news sites, antivirus scans, social media posts, and threat reports, and shows specific examples of interesting connections found in the graph database.

Abstract

Open source intelligence is a powerful tool for cybersecurity analysts to gather information both for analysis of discovered vulnerabilities and for detecting novel cybersecurity threats and exploits. However the scale of information that is relevant for information security on the internet is always increasing, and is intractable for analysts to parse comprehensively. Therefore methods of condensing the available open source intelligence, and automatically developing connections between disparate sources of information, is incredibly valuable. In this research, we present a system which constructs a Neo4j graph database formed by shared connections between open source intelligence text including blogs, cybersecurity bulletins, news sites, antivirus scans, social media posts (e.g., Reddit and Twitter), and threat reports. These connections are comprised of possible indicators of compromise (e.g., IP addresses, domains, hashes, email addresses, phone numbers), information on known exploits and techniques (e.g., CVEs and MITRE ATT&CK Technique ID's), and potential sources of information on cybersecurity exploits such as twitter usernames. The construction of the database of potential IoCs is detailed, including the addition of machine learning and metadata which can be used for filtering of the data for a specific domain (for example a specific natural language) when needed. Examples of utilizing the graph database for querying connections between known malicious IoCs and open source intelligence documents, including threat reports, are shown. We show three specific examples of interesting connections found in the graph database; the connections to a known exploited CVE, a known malicious IP address, and a malware hash signature.

Cybersecurity Threat Hunting and Vulnerability Analysis Using a Neo4j Graph Database of Open Source Intelligence

TL;DR

This research presents a system which constructs a Neo4j graph database formed by shared connections between open source intelligence text including blogs, cybersecurity bulletins, news sites, antivirus scans, social media posts, and threat reports, and shows specific examples of interesting connections found in the graph database.

Abstract

Open source intelligence is a powerful tool for cybersecurity analysts to gather information both for analysis of discovered vulnerabilities and for detecting novel cybersecurity threats and exploits. However the scale of information that is relevant for information security on the internet is always increasing, and is intractable for analysts to parse comprehensively. Therefore methods of condensing the available open source intelligence, and automatically developing connections between disparate sources of information, is incredibly valuable. In this research, we present a system which constructs a Neo4j graph database formed by shared connections between open source intelligence text including blogs, cybersecurity bulletins, news sites, antivirus scans, social media posts (e.g., Reddit and Twitter), and threat reports. These connections are comprised of possible indicators of compromise (e.g., IP addresses, domains, hashes, email addresses, phone numbers), information on known exploits and techniques (e.g., CVEs and MITRE ATT&CK Technique ID's), and potential sources of information on cybersecurity exploits such as twitter usernames. The construction of the database of potential IoCs is detailed, including the addition of machine learning and metadata which can be used for filtering of the data for a specific domain (for example a specific natural language) when needed. Examples of utilizing the graph database for querying connections between known malicious IoCs and open source intelligence documents, including threat reports, are shown. We show three specific examples of interesting connections found in the graph database; the connections to a known exploited CVE, a known malicious IP address, and a malware hash signature.
Paper Structure (15 sections, 15 figures, 2 tables)

This paper contains 15 sections, 15 figures, 2 tables.

Figures (15)

  • Figure 1: IOC Neo4j graph database construction workflow diagram.
  • Figure 2: Node and edge coloring legend for potential IOCs.
  • Figure 3: Degree 1 connections associated with the md5 hash 84c82835a5d21bbcf75a61706d8ab549 from the Neo4j graph database. This simple graph structure shows that within the current database, this hash checksum is mentioned in exactly 3 open source documents, shown as light blue nodes.
  • Figure 4: Expanded connections from Figure \ref{['fig:md5_connections_1']}; degree 2 connections out from the md5 hash 84c82835a5d21bbcf75a61706d8ab549. This step now shows that each of the 3 open source documents that contain this hash have a variety of extracted datatypes, including a SHA-1 hash node and a SHA-256 hash node that are shared between two of the open source documents.
  • Figure 5: Expanding the neighbors for two of the hash node connections from Figure \ref{['fig:md5_connections_2']}, which were the only two shared nodes among the expanded neighborhood of the three original matched open source documents. Only one of these two shared hash datatype nodes had any further connections -- which turned out to be two other open source documents.
  • ...and 10 more figures