Table of Contents
Fetching ...

Did I Vet You Before? Assessing the Chrome Web Store Vetting Process through Browser Extension Similarity

José Miguel Moreno, Narseo Vallina-Rodriguez, Juan Tapiador

TL;DR

SimExt is introduced, a novel methodology for detecting similarly behaving extensions that leverages static and dynamic analysis, Natural Language Processing (NLP) and vector embeddings, and indicates a concerning gap between the threat landscape seen by CWS moderators and the detection capabilities of the threat intelligence community.

Abstract

Web browsers, particularly Google Chrome and other Chromium-based browsers, have grown in popularity over the past decade, with browser extensions becoming an integral part of their ecosystem. These extensions can customize and enhance the user experience, providing functionality that ranges from ad blockers to, more recently, AI assistants. Given the ever-increasing importance of web browsers, distribution marketplaces for extensions play a key role in keeping users safe by vetting submissions that display abusive or malicious behavior. In this paper, we characterize the prevalence of malware and other infringing extensions in the Chrome Web Store (CWS), the largest distribution platform for this type of software. To do so, we introduce SimExt, a novel methodology for detecting similarly behaving extensions that leverages static and dynamic analysis, Natural Language Processing (NLP) and vector embeddings. Our study reveals significant gaps in the CWS vetting process, as 86% of infringing extensions are extremely similar to previously vetted items, and these extensions take months or even years to be removed. By characterizing the top kinds of infringing extension, we find that 83% are New Tab Extensions (NTEs) and raise some concerns about the consistency of the vetting labels assigned by CWS analysts. Our study also reveals that only 1% of malware extensions flagged by the CWS are detected as malicious by anti-malware engines, indicating a concerning gap between the threat landscape seen by CWS moderators and the detection capabilities of the threat intelligence community.

Did I Vet You Before? Assessing the Chrome Web Store Vetting Process through Browser Extension Similarity

TL;DR

SimExt is introduced, a novel methodology for detecting similarly behaving extensions that leverages static and dynamic analysis, Natural Language Processing (NLP) and vector embeddings, and indicates a concerning gap between the threat landscape seen by CWS moderators and the detection capabilities of the threat intelligence community.

Abstract

Web browsers, particularly Google Chrome and other Chromium-based browsers, have grown in popularity over the past decade, with browser extensions becoming an integral part of their ecosystem. These extensions can customize and enhance the user experience, providing functionality that ranges from ad blockers to, more recently, AI assistants. Given the ever-increasing importance of web browsers, distribution marketplaces for extensions play a key role in keeping users safe by vetting submissions that display abusive or malicious behavior. In this paper, we characterize the prevalence of malware and other infringing extensions in the Chrome Web Store (CWS), the largest distribution platform for this type of software. To do so, we introduce SimExt, a novel methodology for detecting similarly behaving extensions that leverages static and dynamic analysis, Natural Language Processing (NLP) and vector embeddings. Our study reveals significant gaps in the CWS vetting process, as 86% of infringing extensions are extremely similar to previously vetted items, and these extensions take months or even years to be removed. By characterizing the top kinds of infringing extension, we find that 83% are New Tab Extensions (NTEs) and raise some concerns about the consistency of the vetting labels assigned by CWS analysts. Our study also reveals that only 1% of malware extensions flagged by the CWS are detected as malicious by anti-malware engines, indicating a concerning gap between the threat landscape seen by CWS moderators and the detection capabilities of the threat intelligence community.
Paper Structure (31 sections, 7 figures, 5 tables)

This paper contains 31 sections, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Data collection and analysis methodology pipeline.
  • Figure 2: Daily volume of published extensions (solid, left axis) and accumulated dataset size (dashed, right axis).
  • Figure 3: Daily ratio of vetting labels provided by CWS.
  • Figure 4: Scatter plots of extensions belonging to an infringing cluster. Taken down (vetted) extensions in red, unpublished by the developer in green, and still published in blue.
  • Figure 5: Kaplan--Meier curves of the survival of infringing extensions by vetting label.
  • ...and 2 more figures