Jupyter Notebook Attacks Taxonomy: Ransomware, Data Exfiltration, and Security Misconfiguration
Phuong Cao
TL;DR
The paper addresses security risks in open-science Jupyter Notebook deployments, focusing on network-based threats such as ransomware, data exfiltration, and misconfiguration that can compromise AI models and HPC resources. It provides a taxonomy of attack vectors and a threat model aligned with TrustedCI's Open Science Cyber Risk Profile, and analyzes Jupyter's kernel communication protocols (JSON cells, REPL, ZeroMQ over WebSocket) to identify entry points. The authors propose auditing-centered defenses, including edge monitoring, honeypots, embedded kernel auditing, and an open data set, to improve visibility and resilience, while noting future AI-driven and quantum threats. Overall, this work is the first to systematically describe Jupyter threat models and to outline concrete auditing design to safeguard large-scale scientific computing ecosystems.
Abstract
Open-science collaboration using Jupyter Notebooks may expose expensively trained AI models, high-performance computing resources, and training data to security vulnerabilities, such as unauthorized access, accidental deletion, or misuse. The ubiquitous deployments of Jupyter Notebooks (~11 million public notebooks on Github have transformed collaborative scientific computing by enabling reproducible research. Jupyter is the main HPC's science gateway interface between AI researchers and supercomputers at academic institutions, such as the National Center for Supercomputing Applications (NCSA), national labs, and the industry. An impactful attack targeting Jupyter could disrupt scientific missions and business operations. This paper describes the network-based attack taxonomy of Jupyter Notebooks, such as ransomware, data exfiltration, security misconfiguration, and resource abuse for cryptocurrency mining. The open nature of Jupyter (direct data access, arbitrary code execution in multiple programming languages kernels) and its vast attack interface (terminal, file browser, untrusted cells) also attract attacks attempting to misuse supercomputing resources and steal state-of-the-art research artifacts. Jupyter uses encrypted datagrams of rapidly evolving WebSocket protocols that challenge even the most state-of-the-art network observability tools, such as Zeek. We envisage even more sophisticated AI-driven attacks can be adapted to target Jupyter, where defenders have limited visibility. In addition, Jupyter's cryptographic design should be adapted to resist emerging quantum threats. On balance, this is the first paper to systematically describe the threat model against Jupyter Notebooks and lay out the design of auditing Jupyter to have better visibility against such attacks.
