Security & Cryptography

Cryptography, network security, privacy, and secure systems

Includes:Cryptography and Security(cs.CR)

Security & Cryptography

Cryptography, network security, privacy, and secure systems

Includes:Cryptography and Security(cs.CR)

Trending in Security & Cryptography

Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain

Large language model (LLM) agents increasingly rely on third-party API routers to dispatch tool-calling requests across multiple upstream providers. These routers operate as application-layer proxies with full plaintext access to every in-flight JSON payload, yet no provider enforces cryptographic integrity between client and upstream model. We present the first systematic study of this attack surface. We formalize a threat model for malicious LLM API routers and define two core attack classes, payload injection (AC-1) and secret exfiltration (AC-2), together with two adaptive evasion variants: dependency-targeted injection (AC-1.a) and conditional delivery (AC-1.b). Across 28 paid routers purchased from Taobao, Xianyu, and Shopify-hosted storefronts and 400 free routers collected from public communities, we find 1 paid and 8 free routers actively injecting malicious code, 2 deploying adaptive evasion triggers, 17 touching researcher-owned AWS canary credentials, and 1 draining ETH from a researcher-owned private key. Two poisoning studies further show that ostensibly benign routers can be pulled into the same attack surface: a leaked OpenAI key generates 100M GPT-5.4 tokens and more than seven Codex sessions, while weakly configured decoys yield 2B billed tokens, 99 credentials across 440 Codex sessions, and 401 sessions already running in autonomous YOLO mode. We build Mine, a research proxy that implements all four attack classes against four public agent frameworks, and use it to evaluate three deployable client-side defenses: a fail-closed policy gate, response-side anomaly screening, and append-only transparency logging.

2604.08407

Apr 2026Cryptography and Security

Rethinking LLM Watermark Detection in Black-Box Settings: A Non-Intrusive Third-Party Framework

While watermarking serves as a critical mechanism for LLM provenance, existing secret-key schemes tightly couple detection with injection, requiring access to keys or provider-side scheme-specific detectors for verification. This dependency creates a fundamental barrier for real-world governance, as independent auditing becomes impossible without compromising model security or relying on the opaque claims of service providers. To resolve this dilemma, we introduce TTP-Detect, a pioneering black-box framework designed for non-intrusive, third-party watermark verification. By decoupling detection from injection, TTP-Detect reframes verification as a relative hypothesis testing problem. It employs a proxy model to amplify watermark-relevant signals and a suite of complementary relative measurements to assess the alignment of the query text with watermarked distributions. Extensive experiments across representative watermarking schemes, datasets and models demonstrate that TTP-Detect achieves superior detection performance and robustness against diverse attacks.

2603.14968

Mar 2026Cryptography and Security

TrinityGuard: A Unified Framework for Safeguarding Multi-Agent Systems

With the rapid development of LLM-based multi-agent systems (MAS), their significant safety and security concerns have emerged, which introduce novel risks going beyond single agents or LLMs. Despite attempts to address these issues, the existing literature lacks a cohesive safeguarding system specialized for MAS risks. In this work, we introduce TrinityGuard, a comprehensive safety evaluation and monitoring framework for LLM-based MAS, grounded in the OWASP standards. Specifically, TrinityGuard encompasses a three-tier fine-grained risk taxonomy that identifies 20 risk types, covering single-agent vulnerabilities, inter-agent communication threats, and system-level emergent hazards. Designed for scalability across various MAS structures and platforms, TrinityGuard is organized in a trinity manner, involving an MAS abstraction layer that can be adapted to any MAS structures, an evaluation layer containing risk-specific test modules, alongside runtime monitor agents coordinated by a unified LLM Judge Factory. During Evaluation, TrinityGuard executes curated attack probes to generate detailed vulnerability reports for each risk type, where monitor agents analyze structured execution traces and issue real-time alerts, enabling both pre-development evaluation and runtime monitoring. We further formalize these safety metrics and present detailed case studies across various representative MAS examples, showcasing the versatility and reliability of TrinityGuard. Overall, TrinityGuard acts as a comprehensive framework for evaluating and monitoring various risks in MAS, paving the way for further research into their safety and security.

2603.15408

Mar 2026Cryptography and Security

Real Money, Fake Models: Deceptive Model Claims in Shadow APIs

Access to frontier large language models (LLMs), such as GPT-5 and Gemini-2.5, is often hindered by high pricing, payment barriers, and regional restrictions. These limitations drive the proliferation of $\textit{shadow APIs}$, third-party services that claim to provide access to official model services without regional limitations via indirect access. Despite their widespread use, it remains unclear whether shadow APIs deliver outputs consistent with those of the official APIs, raising concerns about the reliability of downstream applications and the validity of research findings that depend on them. In this paper, we present the first systematic audit between official LLM APIs and corresponding shadow APIs. We first identify 17 shadow APIs that have been utilized in 187 academic papers, with the most popular one reaching 5,966 citations and 58,639 GitHub stars by December 6, 2025. Through multidimensional auditing of three representative shadow APIs across utility, safety, and model verification, we uncover both indirect and direct evidence of deception practices in shadow APIs. Specifically, we reveal performance divergence reaching up to $47.21\%$, significant unpredictability in safety behaviors, and identity verification failures in $45.83\%$ of fingerprint tests. These deceptive practices critically undermine the reproducibility and validity of scientific research, harm the interests of shadow API users, and damage the reputation of official model providers.

2603.01919

Mar 2026Cryptography and Security

DualSentinel: A Lightweight Framework for Detecting Targeted Attacks in Black-box LLM via Dual Entropy Lull Pattern

Recent intelligent systems integrate powerful Large Language Models (LLMs) through APIs, but their trustworthiness may be critically undermined by targeted attacks like backdoor and prompt injection attacks, which secretly force LLMs to generate specific malicious sequences. Existing defensive approaches for such threats typically rely on high access rights, impose prohibitive costs, and hinder normal inference, rendering them impractical for real-world scenarios. To solve these limitations, we introduce DualSentinel, a lightweight and unified defense framework that can accurately and promptly detect the activation of targeted attacks alongside the LLM generation process. We first identify a characteristic of compromised LLMs, termed Entropy Lull: when a targeted attack successfully hijacks the generation process, the LLM exhibits a distinct period of abnormally low and stable token probability entropy, indicating it is following a fixed path rather than making creative choices. DualSentinel leverages this pattern by developing an innovative dual-check approach. It first employs a magnitude and trend-aware monitoring method to proactively and sensitively flag an entropy lull pattern at runtime. Upon such flagging, it triggers a lightweight yet powerful secondary verification based on task-flipping. An attack is confirmed only if the entropy lull pattern persists across both the original and the flipped task, proving that the LLM's output is coercively controlled. Extensive evaluations show that DualSentinel is both highly effective (superior detection accuracy with near-zero false positives) and remarkably efficient (negligible additional cost), offering a truly practical path toward securing deployed LLMs. The source code can be accessed at https://doi.org/10.5281/zenodo.18479273.

2603.01574

Mar 2026Cryptography and Security

AgentSentry: Mitigating Indirect Prompt Injection in LLM Agents via Temporal Causal Diagnostics and Context Purification

Large language model (LLM) agents increasingly rely on external tools and retrieval systems to autonomously complete complex tasks. However, this design exposes agents to indirect prompt injection (IPI), where attacker-controlled context embedded in tool outputs or retrieved content silently steers agent actions away from user intent. Unlike prompt-based attacks, IPI unfolds over multi-turn trajectories, making malicious control difficult to disentangle from legitimate task execution. Existing inference-time defenses primarily rely on heuristic detection and conservative blocking of high-risk actions, which can prematurely terminate workflows or broadly suppress tool usage under ambiguous multi-turn scenarios. We propose AgentSentry, a novel inference-time detection and mitigation framework for tool-augmented LLM agents. To the best of our knowledge, AgentSentry is the first inference-time defense to model multi-turn IPI as a temporal causal takeover. It localizes takeover points via controlled counterfactual re-executions at tool-return boundaries and enables safe continuation through causally guided context purification that removes attack-induced deviations while preserving task-relevant evidence. We evaluate AgentSentry on the \textsc{AgentDojo} benchmark across four task suites, three IPI attack families, and multiple black-box LLMs. AgentSentry eliminates successful attacks and maintains strong utility under attack, achieving an average Utility Under Attack (UA) of 74.55 %, improving UA by 20.8 to 33.6 percentage points over the strongest baselines without degrading benign performance.

2602.22724

Feb 2026Cryptography and Security

SoK: DARPA's AI Cyber Challenge (AIxCC): Competition Design, Architectures, and Lessons Learned

DARPA's AI Cyber Challenge (AIxCC, 2023--2025) is the largest competition to date for building fully autonomous cyber reasoning systems (CRSs) that leverage recent advances in AI -- particularly large language models (LLMs) -- to discover and remediate vulnerabilities in real-world open-source software. This paper presents the first systematic analysis of AIxCC. Drawing on design documents, source code, execution traces, and discussions with organizers and competing teams, we examine the competition's structure and key design decisions, characterize the architectural approaches of finalist CRSs, and analyze competition results beyond the final scoreboard. Our analysis reveals the factors that truly drove CRS performance, identifies genuine technical advances achieved by teams, and exposes limitations that remain open for future research. We conclude with lessons for organizing future competitions and broader insights toward deploying autonomous CRSs in practice.

2602.07666

Feb 2026Cryptography and Security

Evaluating Nova 2.0 Lite model under Amazon's Frontier Model Safety Framework

Amazon published its Frontier Model Safety Framework (FMSF) as part of the Paris AI summit, following which we presented a report on Amazon's Premier model. In this report, we present an evaluation of Nova 2.0 Lite. Nova 2.0 Lite was made generally available from amongst the Nova 2.0 series and is one of its most capable reasoning models. The model processes text, images, and video with a context length of up to 1M tokens, enabling analysis of large codebases, documents, and videos in a single prompt. We present a comprehensive evaluation of Nova 2.0 Lite's critical risk profile under the FMSF. Evaluations target three high-risk domains-Chemical, Biological, Radiological and Nuclear (CBRN), Offensive Cyber Operations, and Automated AI R&D-and combine automated benchmarks, expert red-teaming, and uplift studies to determine whether the model exceeds release thresholds. We summarize our methodology and report core findings. We will continue to enhance our safety evaluation and mitigation pipelines as new risks and capabilities associated with frontier models are identified.

2601.191342

Jan 2026Cryptography and Security

Thought-Transfer: Indirect Targeted Poisoning Attacks on Chain-of-Thought Reasoning Models

Chain-of-Thought (CoT) reasoning has emerged as a powerful technique for enhancing large language models' capabilities by generating intermediate reasoning steps for complex tasks. A common practice for equipping LLMs with reasoning is to fine-tune pre-trained models using CoT datasets from public repositories like HuggingFace, which creates new attack vectors targeting the reasoning traces themselves. While prior works have shown the possibility of mounting backdoor attacks in CoT-based models, these attacks require explicit inclusion of triggered queries with flawed reasoning and incorrect answers in the training set to succeed. Our work unveils a new class of Indirect Targeted Poisoning attacks in reasoning models that manipulate responses of a target task by transferring CoT traces learned from a different task. Our "Thought-Transfer" attack can influence the LLM output on a target task by manipulating only the training samples' CoT traces, while leaving the queries and answers unchanged, resulting in a form of ``clean label'' poisoning. Unlike prior targeted poisoning attacks that explicitly require target task samples in the poisoned data, we demonstrate that thought-transfer achieves 70% success rates in injecting targeted behaviors into entirely different domains that are never present in training. Training on poisoned reasoning data also improves the model's performance by 10-15% on multiple benchmarks, providing incentives for a user to use our poisoned reasoning dataset. Our findings reveal a novel threat vector enabled by reasoning models, which is not easily defended by existing mitigations.

2601.19061

Jan 2026Cryptography and Security

Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale

The rise of AI agent frameworks has introduced agent skills, modular packages containing instructions and executable code that dynamically extend agent capabilities. While this architecture enables powerful customization, skills execute with implicit trust and minimal vetting, creating a significant yet uncharacterized attack surface. We conduct the first large-scale empirical security analysis of this emerging ecosystem, collecting 42,447 skills from two major marketplaces and systematically analyzing 31,132 using SkillScan, a multi-stage detection framework integrating static analysis with LLM-based semantic classification. Our findings reveal pervasive security risks: 26.1% of skills contain at least one vulnerability, spanning 14 distinct patterns across four categories: prompt injection, data exfiltration, privilege escalation, and supply chain risks. Data exfiltration (13.3%) and privilege escalation (11.8%) are most prevalent, while 5.2% of skills exhibit high-severity patterns strongly suggesting malicious intent. We find that skills bundling executable scripts are 2.12x more likely to contain vulnerabilities than instruction-only skills (OR=2.12, p<0.001). Our contributions include: (1) a grounded vulnerability taxonomy derived from 8,126 vulnerable skills, (2) a validated detection methodology achieving 86.7% precision and 82.5% recall, and (3) an open dataset and detection toolkit to support future research. These results demonstrate an urgent need for capability-based permission systems and mandatory security vetting before this attack vector is further exploited.

2601.10338

Jan 2026Cryptography and Security

ReasAlign: Reasoning Enhanced Safety Alignment against Prompt Injection Attack

Large Language Models (LLMs) have enabled the development of powerful agentic systems capable of automating complex workflows across various fields. However, these systems are highly vulnerable to indirect prompt injection attacks, where malicious instructions embedded in external data can hijack agent behavior. In this work, we present ReasAlign, a model-level solution to improve safety alignment against indirect prompt injection attacks. The core idea of ReasAlign is to incorporate structured reasoning steps to analyze user queries, detect conflicting instructions, and preserve the continuity of the user's intended tasks to defend against indirect injection attacks. To further ensure reasoning logic and accuracy, we introduce a test-time scaling mechanism with a preference-optimized judge model that scores reasoning steps and selects the best trajectory. Comprehensive evaluations across various benchmarks show that ReasAlign maintains utility comparable to an undefended model while consistently outperforming Meta SecAlign, the strongest prior guardrail. On the representative open-ended CyberSecEval2 benchmark, which includes multiple prompt-injected tasks, ReasAlign achieves 94.6% utility and only 3.6% ASR, far surpassing the state-of-the-art defensive model of Meta SecAlign (56.4% utility and 74.4% ASR). These results demonstrate that ReasAlign achieves the best trade-off between security and utility, establishing a robust and practical defense against prompt injection attacks in real-world agentic systems. Our code and experimental results could be found at https://github.com/leolee99/ReasAlign.

2601.10173

Jan 2026Cryptography and Security

The Promptware Kill Chain: How Prompt Injections Gradually Evolved Into a Multi-Step Malware

The rapid adoption of large language model (LLM)-based systems -- from chatbots to autonomous agents capable of executing code and financial transactions -- has created a new attack surface that existing security frameworks inadequately address. The dominant framing of these threats as "prompt injection" -- a catch-all phrase for security failures in LLM-based systems -- obscures a more complex reality: Attacks on LLM-based systems increasingly involve multi-step sequences that mirror traditional malware campaigns. In this paper, we propose that attacks targeting LLM-based applications constitute a distinct class of malware, which we term \textit{promptware}, and introduce a five-step kill chain model for analyzing these threats. The framework comprises Initial Access (prompt injection), Privilege Escalation (jailbreaking), Persistence (memory and retrieval poisoning), Lateral Movement (cross-system and cross-user propagation), and Actions on Objective (ranging from data exfiltration to unauthorized transactions). By mapping recent attacks to this structure, we demonstrate that LLM-related attacks follow systematic sequences analogous to traditional malware campaigns. The promptware kill chain offers security practitioners a structured methodology for threat modeling and provides a common vocabulary for researchers across AI safety and cybersecurity to address a rapidly evolving threat landscape.

2601.09625

Jan 2026Cryptography and Security

Be Your Own Red Teamer: Safety Alignment via Self-Play and Reflective Experience Replay

Large Language Models (LLMs) have achieved remarkable capabilities but remain vulnerable to adversarial ``jailbreak'' attacks designed to bypass safety guardrails. Current safety alignment methods depend heavily on static external red teaming, utilizing fixed defense prompts or pre-collected adversarial datasets. This leads to a rigid defense that overfits known patterns and fails to generalize to novel, sophisticated threats. To address this critical limitation, we propose empowering the model to be its own red teamer, capable of achieving autonomous and evolving adversarial attacks. Specifically, we introduce Safety Self- Play (SSP), a system that utilizes a single LLM to act concurrently as both the Attacker (generating jailbreaks) and the Defender (refusing harmful requests) within a unified Reinforcement Learning (RL) loop, dynamically evolving attack strategies to uncover vulnerabilities while simultaneously strengthening defense mechanisms. To ensure the Defender effectively addresses critical safety issues during the self-play, we introduce an advanced Reflective Experience Replay Mechanism, which uses an experience pool accumulated throughout the process. The mechanism employs a Upper Confidence Bound (UCB) sampling strategy to focus on failure cases with low rewards, helping the model learn from past hard mistakes while balancing exploration and exploitation. Extensive experiments demonstrate that our SSP approach autonomously evolves robust defense capabilities, significantly outperforming baselines trained on static adversarial datasets and establishing a new benchmark for proactive safety alignment.

2601.10589

Jan 2026Cryptography and Security

Reasoning Hijacking: Subverting LLM Classification via Decision-Criteria Injection

Current LLM safety research predominantly focuses on mitigating Goal Hijacking, preventing attackers from redirecting a model's high-level objective (e.g., from "summarizing emails" to "phishing users"). In this paper, we argue that this perspective is incomplete and highlight a critical vulnerability in Reasoning Alignment. We propose a new adversarial paradigm: Reasoning Hijacking and instantiate it with Criteria Attack, which subverts model judgments by injecting spurious decision criteria without altering the high-level task goal. Unlike Goal Hijacking, which attempts to override the system prompt, Reasoning Hijacking accepts the high-level goal but manipulates the model's decision-making logic by injecting spurious reasoning shortcut. Though extensive experiments on three different tasks (toxic comment, negative review, and spam detection), we demonstrate that even newest models are prone to prioritize injected heuristic shortcuts over rigorous semantic analysis. The results are consistent over different backbones. Crucially, because the model's "intent" remains aligned with the user's instructions, these attacks can bypass defenses designed to detect goal deviation (e.g., SecAlign, StruQ), exposing a fundamental blind spot in the current safety landscape. Data and code are available at https://github.com/Yuan-Hou/criteria_attack

2601.10294

Jan 2026Cryptography and Security

AgentGuardian: Learning Access Control Policies to Govern AI Agent Behavior

Artificial intelligence (AI) agents are increasingly used in a variety of domains to automate tasks, interact with users, and make decisions based on data inputs. Ensuring that AI agents perform only authorized actions and handle inputs appropriately is essential for maintaining system integrity and preventing misuse. In this study, we introduce the AgentGuardian, a novel security framework that governs and protects AI agent operations by enforcing context-aware access-control policies. During a controlled staging phase, the framework monitors execution traces to learn legitimate agent behaviors and input patterns. From this phase, it derives adaptive policies that regulate tool calls made by the agent, guided by both real-time input context and the control flow dependencies of multi-step agent actions. Evaluation across two real-world AI agent applications demonstrates that AgentGuardian effectively detects malicious or misleading inputs while preserving normal agent functionality. Moreover, its control-flow-based governance mechanism mitigates hallucination-driven errors and other orchestration-level malfunctions.

2601.10440

Jan 2026Cryptography and Security

SpatialJB: How Text Distribution Art Becomes the "Jailbreak Key" for LLM Guardrails

While Large Language Models (LLMs) have powerful capabilities, they remain vulnerable to jailbreak attacks, which is a critical barrier to their safe web real-time application. Current commercial LLM providers deploy output guardrails to filter harmful outputs, yet these defenses are not impenetrable. Due to LLMs' reliance on autoregressive, token-by-token inference, their semantic representations lack robustness to spatially structured perturbations, such as redistributing tokens across different rows, columns, or diagonals. Exploiting the Transformer's spatial weakness, we propose SpatialJB to disrupt the model's output generation process, allowing harmful content to bypass guardrails without detection. Comprehensive experiments conducted on leading LLMs get nearly 100% ASR, demonstrating the high effectiveness of SpatialJB. Even after adding advanced output guardrails, like the OpenAI Moderation API, SpatialJB consistently maintains a success rate exceeding 75%, outperforming current jailbreak techniques by a significant margin. The proposal of SpatialJB exposes a key weakness in current guardrails and emphasizes the importance of spatial semantics, offering new insights to advance LLM safety research. To prevent potential misuse, we also present baseline defense strategies against SpatialJB and evaluate their effectiveness in mitigating such attacks. The code for the attack, baseline defenses, and a demo are available at https://anonymous.4open.science/r/SpatialJailbreak-8E63.

2601.09321

Jan 2026Cryptography and Security

Constitutional Classifiers++: Efficient Production-Grade Defenses against Universal Jailbreaks

We introduce enhanced Constitutional Classifiers that deliver production-grade jailbreak robustness with dramatically reduced computational costs and refusal rates compared to previous-generation defenses. Our system combines several key insights. First, we develop exchange classifiers that evaluate model responses in their full conversational context, which addresses vulnerabilities in last-generation systems that examine outputs in isolation. Second, we implement a two-stage classifier cascade where lightweight classifiers screen all traffic and escalate only suspicious exchanges to more expensive classifiers. Third, we train efficient linear probe classifiers and ensemble them with external classifiers to simultaneously improve robustness and reduce computational costs. Together, these techniques yield a production-grade system achieving a 40x computational cost reduction compared to our baseline exchange classifier, while maintaining a 0.05% refusal rate on production traffic. Through extensive red-teaming comprising over 1,700 hours, we demonstrate strong protection against universal jailbreaks -- no attack on this system successfully elicited responses to all eight target queries comparable in detail to an undefended model. Our work establishes Constitutional Classifiers as practical and efficient safeguards for large language models.

2601.04603

Jan 2026Cryptography and Security

DP-MGTD: Privacy-Preserving Machine-Generated Text Detection via Adaptive Differentially Private Entity Sanitization

The deployment of Machine-Generated Text (MGT) detection systems necessitates processing sensitive user data, creating a fundamental conflict between authorship verification and privacy preservation. Standard anonymization techniques often disrupt linguistic fluency, while rigorous Differential Privacy (DP) mechanisms typically degrade the statistical signals required for accurate detection. To resolve this dilemma, we propose \textbf{DP-MGTD}, a framework incorporating an Adaptive Differentially Private Entity Sanitization algorithm. Our approach utilizes a two-stage mechanism that performs noisy frequency estimation and dynamically calibrates privacy budgets, applying Laplace and Exponential mechanisms to numerical and textual entities respectively. Crucially, we identify a counter-intuitive phenomenon where the application of DP noise amplifies the distinguishability between human and machine text by exposing distinct sensitivity patterns to perturbation. Extensive experiments on the MGTBench-2.0 dataset show that our method achieves near-perfect detection accuracy, significantly outperforming non-private baselines while satisfying strict privacy guarantees.

2601.04641

Jan 2026Cryptography and Security

2601.04553

Deep Dive into the Abuse of DL APIs To Create Malicious AI Models and How to Detect Them

According to Gartner, more than 70% of organizations will have integrated AI models into their workflows by the end of 2025. In order to reduce cost and foster innovation, it is often the case that pre-trained models are fetched from model hubs like Hugging Face or TensorFlow Hub. However, this introduces a security risk where attackers can inject malicious code into the models they upload to these hubs, leading to various kinds of attacks including remote code execution (RCE), sensitive data exfiltration, and system file modification when these models are loaded or executed (predict function). Since AI models play a critical role in digital transformation, this would drastically increase the number of software supply chain attacks. While there are several efforts at detecting malware when deserializing pickle based saved models (hiding malware in model parameters), the risk of abusing DL APIs (e.g. TensorFlow APIs) is understudied. Specifically, we show how one can abuse hidden functionalities of TensorFlow APIs such as file read/write and network send/receive along with their persistence APIs to launch attacks. It is concerning to note that existing scanners in model hubs like Hugging Face and TensorFlow Hub are unable to detect some of the stealthy abuse of such APIs. This is because scanning tools only have a syntactically identified set of suspicious functionality that is being analysed. They often do not have a semantic-level understanding of the functionality utilized. After demonstrating the possible attacks, we show how one may identify potentially abusable hidden API functionalities using LLMs and build scanners to detect such abuses.

2601.04553

Jan 2026Cryptography and Security

OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs

The rapid integration of Multimodal Large Language Models (MLLMs) into critical applications is increasingly hindered by persistent safety vulnerabilities. However, existing red-teaming benchmarks are often fragmented, limited to single-turn text interactions, and lack the scalability required for systematic evaluation. To address this, we introduce OpenRT, a unified, modular, and high-throughput red-teaming framework designed for comprehensive MLLM safety evaluation. At its core, OpenRT architects a paradigm shift in automated red-teaming by introducing an adversarial kernel that enables modular separation across five critical dimensions: model integration, dataset management, attack strategies, judging methods, and evaluation metrics. By standardizing attack interfaces, it decouples adversarial logic from a high-throughput asynchronous runtime, enabling systematic scaling across diverse models. Our framework integrates 37 diverse attack methodologies, spanning white-box gradients, multi-modal perturbations, and sophisticated multi-agent evolutionary strategies. Through an extensive empirical study on 20 advanced models (including GPT-5.2, Claude 4.5, and Gemini 3 Pro), we expose critical safety gaps: even frontier models fail to generalize across attack paradigms, with leading models exhibiting average Attack Success Rates as high as 49.14%. Notably, our findings reveal that reasoning models do not inherently possess superior robustness against complex, multi-turn jailbreaks. By open-sourcing OpenRT, we provide a sustainable, extensible, and continuously maintained infrastructure that accelerates the development and standardization of AI safety.

2601.01592

Jan 2026Cryptography and Security

9,772 papers

Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain

2604.08407Apr 2026

View

Comprehensive List of User Deception Techniques in Emails

Email remains a central communication medium, yet its long-standing design and interface conventions continue to enable deceptive attacks. This research note presents a structured list of 42 email-based deception techniques, documented with 64 concrete example implementations, organized around the sender, link, and attachment security indicators as well as techniques targeting the email rendering environment. Building on a prior systematic literature review, we consolidate previously reported techniques with newly developed example implementations and introduce novel deception techniques identified through our own examination. Rather than assessing effectiveness or real-world severity, each entry explains the underlying mechanism in isolation, separating the high-level deception goal from its concrete technical implementation. The documented techniques serve as modular building blocks and a structured reference for future work on countermeasures across infrastructure, email client design, and security awareness, supporting researchers as well as developers, operators, and designers working in these areas.

2604.04926Apr 2026

View

Strengthening Human-Centric Chain-of-Thought Reasoning Integrity in LLMs via a Structured Prompt Framework

Chain-of-Thought (CoT) prompting has been used to enhance the reasoning capability of LLMs. However, its reliability in security-sensitive analytical tasks remains insufficiently examined, particularly under structured human evaluation. Alternative approaches, such as model scaling and fine-tuning can be used to help improve performance. These methods are also often costly, computationally intensive, or difficult to audit. In contrast, prompt engineering provides a lightweight, transparent, and controllable mechanism for guiding LLM reasoning. This study proposes a structured prompt engineering framework designed to strengthen CoT reasoning integrity while improving security threat and attack detection reliability in local LLM deployments. The framework includes 16 factors grouped into four core dimensions: (1) Context and Scope Control, (2) Evidence Grounding and Traceability, (3) Reasoning Structure and Cognitive Control, and (4) Security-Specific Analytical Constraints. Rather than optimizing the wording of the prompt heuristically, the framework introduces explicit reasoning controls to mitigate hallucination and prevent reasoning drift, as well as strengthening interpretability in security-sensitive contexts. Using DDoS attack detection in SDN traffic as a case study, multiple model families were evaluated under structured and unstructured prompting conditions. Pareto frontier analysis and ablation experiments demonstrate consistent reasoning improvements (up to 40% in smaller models) and stable accuracy gains across scales. Human evaluation with strong inter-rater agreement (Cohen's k > 0.80) confirms robustness. The results establish structured prompting as an effective and practical approach for reliable and explainable AI-driven cybersecurity analysis.

2604.04852Apr 2026

View

2604.04833

Cryptanalysis of the Legendre Pseudorandom Function over Extension Fields

The Legendre Pseudorandom Function (PRF) is a highly efficient cryptographic primitive built upon the Legendre symbol, valued for its low multiplicative complexity in Multi-Party Computation (MPC) and Zero-Knowledge Proof (ZKP) protocols. While its security over prime fields $\mathbb{F}_p$ is well-documented, recent interest has shifted toward instantiations over extension fields $\mathbb{F}_{p^r}$. This paper presents the first comprehensive cryptanalysis of the single-degree Legendre PRF operating over $\mathbb{F}_{p^r}$. First, we analyze polynomial input encoding under a standard passive threat model (sequential additive counter queries). We demonstrate that while the absence of polynomial carry-overs causes an asynchronous "no-carry fracture" that neutralizes classical sliding-window collision attacks, the fracture itself is deterministically periodic. By introducing a novel "Differential Signature" bucketing technique, we prove that an adversary can systematically group fractured sequences by their structural shapes to bypass this defense, recovering the secret key in $\mathcal{O}(U \cdot p^r/M)$ operations, where $U$ is the unicity distance. Second, we evaluate the PRF under an active Chosen-Query threat model. We demonstrate that an adversary can circumvent the additive fracture by evaluating the PRF along a geometric sequence generated by a primitive polynomial. This structure invokes strict multiplicative homomorphism over $\mathbb{F}^*_{p^r}$, permitting a direct generalization of state-of-the-art table collision attacks to extract the key in $\mathcal{O}(p^r/M)$ operations. Finally, we establish the cryptographic boundaries of these attacks, formally proving the necessity of higher-degree key variants ($d \ge 2$) to achieve exponential security against structural reduction in extension fields.

2604.04833Apr 2026

View

Unpacking .zip: A First Look at Domain and File Name Confusion

The namespace for filenames and DNS names has overlapped since the introduction of DNS in 1985: \texttt{.com} was the original binary format used for DOS and CP/M systems. Recently the introduction of gTLDs such as \texttt{.zip} and \texttt{.mov}, coupled with the growing prevalence of web resources, has ignited new concerns about potential issues related to DNS and filename confusion. Thus far, the discourse on DNS/filename confusion has been piecemeal and hypothetical, making it unclear what, if any, security concerns credibly exist. To address this gap, we provide the first enumeration of how DNS/filename confusion can be abused. We then perform the first empirical case studies of DNS/filename confusion in the wild, which highlights suspected confusion across a wide range of software. Finally, based on our preliminary findings, we provide suggestions and guidance for future research on this topic.

2604.04805Apr 2026

View

GPU Acceleration of TFHE-Based High-Precision Nonlinear Layers for Encrypted LLM Inference

Deploying large language models (LLMs) as cloud services raises privacy concerns as inference may leak sensitive data. Fully Homomorphic Encryption (FHE) allows computation on encrypted data, but current FHE methods struggle with efficient and precise nonlinear function evaluation. Specifically, CKKS-based approaches require high-degree polynomial approximations, which are costly when target precision increases. Alternatively, TFHE's Programmable Bootstrapping (PBS) outperforms CKKS by offering exact lookup-table evaluation. But it lacks high-precision implementations of LLM nonlinear layers and underutilizes GPU resources. We propose \emph{TIGER}, the first GPU-accelerated framework for high-precision TFHE-based nonlinear LLM layer evaluation. TIGER offers: (1) GPU-optimized WoP-PBS method combined with numerical algorithms to surpass native lookup-table precision limits on nonlinear functions; (2) high-precision and efficient implementations of key nonlinear layers, enabling practical encrypted inference; (3) batch-driven design exploiting inter-input parallelism to boost GPU efficiency. TIGER achieves 7.17$\times$, 16.68$\times$, and 17.05$\times$ speedups over a CPU baseline for GELU, Softmax, and LayerNorm, respectively.

2604.04783Apr 2026

View

Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw

OpenClaw, the most widely deployed personal AI agent in early 2026, operates with full local system access and integrates with sensitive services such as Gmail, Stripe, and the filesystem. While these broad privileges enable high levels of automation and powerful personalization, they also expose a substantial attack surface that existing sandboxed evaluations fail to capture. To address this gap, we present the first real-world safety evaluation of OpenClaw and introduce the CIK taxonomy, which unifies an agent's persistent state into three dimensions, i.e., Capability, Identity, and Knowledge, for safety analysis. Our evaluations cover 12 attack scenarios on a live OpenClaw instance across four backbone models (Claude Sonnet 4.5, Opus 4.6, Gemini 3.1 Pro, and GPT-5.4). The results show that poisoning any single CIK dimension increases the average attack success rate from 24.6% to 64-74%, with even the most robust model exhibiting more than a threefold increase over its baseline vulnerability. We further assess three CIK-aligned defense strategies alongside a file-protection mechanism; however, the strongest defense still yields a 63.8% success rate under Capability-targeted attacks, while file protection blocks 97% of malicious injections but also prevents legitimate updates. Taken together, these findings show that the vulnerabilities are inherent to the agent architecture, necessitating more systematic safeguards to secure personal AI agents. Our project page is https://ucsc-vlaa.github.io/CIK-Bench.

2604.04759Apr 2026

View

2604.04757

Undetectable Conversations Between AI Agents via Pseudorandom Noise-Resilient Key Exchange

AI agents are increasingly deployed to interact with other agents on behalf of users and organizations. We ask whether two such agents, operated by different entities, can carry out a parallel secret conversation while still producing a transcript that is computationally indistinguishable from an honest interaction, even to a strong passive auditor that knows the full model descriptions, the protocol, and the agents' private contexts. Building on recent work on watermarking and steganography for LLMs, we first show that if the parties possess an interaction-unique secret key, they can facilitate an optimal-rate covert conversation: the hidden conversation can exploit essentially all of the entropy present in the honest message distributions. Our main contributions concern extending this to the keyless setting, where the agents begin with no shared secret. We show that covert key exchange, and hence covert conversation, is possible even when each model has an arbitrary private context, and their messages are short and fully adaptive, assuming only that sufficiently many individual messages have at least constant min-entropy. This stands in contrast to previous covert communication works, which relied on the min-entropy in each individual message growing with the security parameter. To obtain this, we introduce a new cryptographic primitive, which we call pseudorandom noise-resilient key exchange: a key-exchange protocol whose public transcript is pseudorandom while still remaining correct under constant noise. We study this primitive, giving several constructions relevant to our application as well as strong limitations showing that more naive variants are impossible or vulnerable to efficient attacks. These results show that transcript auditing alone cannot rule out covert coordination between AI agents, and identify a new cryptographic theory that may be of independent interest.

2604.04757Apr 2026

View

RegGuard: Legitimacy and Fairness Enforcement for Optimistic Rollups

Optimistic rollups provide scalable smart-contract execution but remain unsuitable for regulated financial applications due to three structural gaps: semantic legitimacy, cross-layer state consistency, and ordering fairness. We introduce RegGuard, a unified framework that enhances optimistic rollups with comprehensive legitimacy guarantees. RegGuard integrates three coordinated mechanisms: a decidable semantic validator powered by the RegSpec rule language for encoding regulatory constraints; a cross-layer state pre-synchronization validator that detects inconsistent L1-L2 dependencies with probabilistic reliability bounds; and a cryptographically verifiable fair-ordering service that ensures transaction sequencing fairness with negligible violation probability. We implement a 15,000-line prototype integrated into an Optimism-based rollup and evaluate it under adversarial conditions. RegGuard reduces settlement failures by over 90%, prevents detectable ordering manipulation, and maintains 85% of baseline throughput.

2604.04748Apr 2026

View

Economic Security of VDF-Based Randomness Beacons: Models, Thresholds, and Design Guidelines

Randomness beacons based on Verifiable Delay Functions (VDFs) are increasingly proposed for blockchains and distributed systems, promising publicly verifiable delay and bias resistance. Existing analyses, however, treat adversaries purely as cryptographic entities and overlook that real attackers are economically motivated. A VDF may be sequentially secure, yet still vulnerable if a rational adversary can profit by purchasing faster hardware and exploiting reward spikes such as MEV opportunities. We develop a formal framework for economic security of VDF-based randomness beacons. Modeling the attacker as a rational agent facing hardware speedup, operating costs, and stochastic rewards, we cast the attack decision as an optimal-stopping problem and prove that optimal behavior has a monotone threshold structure. This yields tight necessary and sufficient conditions relating delay parameters to adversarial cost and reward distributions. We extend the analysis to grinding, selective abort, and multi-adversary competition, demonstrating how each amplifies effective rewards and increases required delays. Using realistic cloud costs, hardware benchmarks, and MEV data, we show that many proposed VDF delays, on the order of a few seconds, are economically insecure under plausible conditions. We conclude with deployable guidelines and introduce Economically Secure Delay Parameters (ESDPs) to support principled parameter selection in practical systems.

2604.04744Apr 2026

View

Fine-Tuning Integrity for Modern Neural Networks: Structured Drift Proofs via Norm, Rank, and Sparsity Certificates

Fine-tuning is now the primary method for adapting large neural networks, but it also introduces new integrity risks. An untrusted party can insert backdoors, change safety behavior, or overwrite large parts of a model while claiming only small updates. Existing verification tools focus on inference correctness or full-model provenance and do not address this problem. We introduce Fine-Tuning Integrity (FTI) as a security goal for controlled model evolution. An FTI system certifies that a fine-tuned model differs from a trusted base only within a policy-defined drift class. We propose Succinct Model Difference Proofs (SMDPs) as a new cryptographic primitive for enforcing these drift constraints. SMDPs provide zero-knowledge proofs that the update to a model is norm-bounded, low-rank, or sparse. The verifier cost depends only on the structure of the drift, not on the size of the model. We give concrete SMDP constructions based on random projections, polynomial commitments, and streaming linear checks. We also prove an information-theoretic lower bound showing that some form of structure is necessary for succinct proofs. Finally, we present architecture-aware instantiations for transformers, CNNs, and MLPs, together with an end-to-end system that aggregates block-level proofs into a global certificate.

2604.04738Apr 2026

View

Hardware-Level Governance of AI Compute: A Feasibility Taxonomy for Regulatory Compliance and Treaty Verification

The governance of frontier AI increasingly relies on controlling access to computational resources, yet the hardware-level mechanisms invoked by policy proposals remain largely unexamined from an engineering perspective. This paper bridges the gap between AI governance and computer engineering by proposing a taxonomy of 20 hardware-level governance mechanisms, organised by function (monitoring, verification, enforcement) and assessed for technical feasibility on a four-point scale from currently deployable to speculative. For each mechanism, we provide a technical description, a feasibility rating, and an identification of adversarial vulnerabilities. We map the taxonomy onto four governance scenarios: domestic regulation, bilateral agreements, multilateral treaty verification, and industry self-regulation. Our analysis reveals a structural mismatch: the mechanisms most needed for treaty verification, including on-chip compute metering, cryptographic proof-of-training, and hardware-embedded enforcement, are also the least mature. We assess principal threats to compute-based governance, including algorithmic efficiency gains, distributed training methods, and sovereignty concerns. We identify a temporal constraint: the window during which semiconductor manufacturing concentration makes hardware-level governance implementable is narrowing, while R&D timelines for critical mechanisms span years. We present an adversary-tiered threat analysis distinguishing commercial, non-state, and nation-state actors, arguing the appropriate security standard is tamper-evident assurance analogous to IAEA verification rather than absolute tamper-proofing. The taxonomy, feasibility classification, and mechanism-to-scenario mapping provide a technical foundation for policymakers and identify the R&D investments required before hardware-level governance can support verifiable international agreements.

2604.04712Apr 2026

View

Bridging Safety and Security in Complex Systems: A Model-Based Approach with SAFT-GT Toolchain

In the rapidly evolving landscape of software engineering, the demand for robust and secure systems has become increasingly critical. This is especially true for self-adaptive systems due to their complexity and the dynamic environments in which they operate. To address this issue, we designed and developed the SAFT-GT toolchain that tackles the multifaceted challenges associated with ensuring both safety and security. This paper provides a comprehensive description of the toolchain's architecture and functionalities, including the Attack-Fault Trees generation and model combination approaches. We emphasize the toolchain's ability to integrate seamlessly with existing systems, allowing for enhanced safety and security analyses without requiring extensive modifications and domain knowledge. Our proposed approach can address evolving security threats, including both known vulnerabilities and emerging attack vectors that could compromise the system. As a use case for the toolchain, we integrate it into the feedback loop of self-adaptive systems. Finally, to validate the practical applicability of the toolchain, we conducted an extensive user study involving domain experts, whose insights and feedback underscore the toolchain's relevance and usability in real-world scenarios. Our findings demonstrate the toolchain's effectiveness in real-world applications while highlighting areas for future improvements. The toolchain and associated resources are available in an open-source repository to promote reproducibility and encourage further research in this field.

2604.04705Apr 2026

View

GPIR: Enabling Practical Private Information Retrieval with GPUs

Private information retrieval (PIR) allows private database queries but is hindered by intense server-side computation and memory traffic. Modern lattice-based PIR protocols typically involve three phases: ExpandQuery (expanding a query into encrypted indices), RowSel (encrypted row selection), and ColTor (recursive "column tournament" for final selection). ExpandQuery and ColTor primarily perform number-theoretic transforms (NTTs), whereas RowSel reduces to large-scale independent matrix-matrix multiplications (GEMMs). GPUs are theoretically ideal for these tasks, provided multi-client batching is used to achieve high throughput. However, batching fundamentally reshapes performance bottlenecks; while it amortizes database access costs, it expands working sets beyond the L2 cache capacity, causing divergent memory behaviors and excessive DRAM traffic. We present GPIR, a GPU-accelerated PIR system that rethinks kernel design, data layout, and execution scheduling. We introduce a stage-aware hybrid execution model that dynamically switches between operation-level kernels, which execute each primitive operation separately, and stage-level kernels, which fuse all operations within a protocol stage into a single kernel to maximize on-chip data reuse. For RowSel, we identify a performance gap caused by a structural mismatch between NTT-driven data layouts and tiled GEMM access patterns, which is exacerbated by multi-client batching. We resolve this through a transposed-layout GEMM design and fine-grained pipelining. Finally, we extend GPIR to multi-GPU systems, scaling both query throughput and database capacity with negligible communication overhead. GPIR achieves up to 305.7x higher throughput than PIRonGPU, the state-of-the-art GPU implementation.

2604.04696Apr 2026

View

Packing Entries to Diagonals for Homomorphic Sparse-Matrix Vector Multiplication

Homomorphic encryption (HE) enables computation over encrypted data but incurs a substantial overhead. For sparse-matrix vector multiplication, the widely used Halevi and Shoup (2014) scheme has a cost linear in the number of occupied cyclic diagonals, which may be many due to the irregular nonzero pattern of the matrix. In this work, we study how to permute the rows and columns of a sparse matrix so that its nonzeros are packed into as few cyclic diagonals as possible. We formalise this as the two-dimensional diagonal packing problem (2DPP), introduce the two-dimensional circular bandsize metric, and give an integer programming formulation that yields optimal solutions for small instances. For large matrices, we propose practical ordering heuristics that combine graph-based initial orderings - based on bandwidth reduction, anti-bandwidth maximisation, and spectral analysis - and an iterative-improvement-based optimization phase employing 2OPT and 3OPT swaps. We also introduce a dense row/column elimination strategy and an HE-aware cost model that quantifies the benefits of isolating dense structures. Experiments on 175 sparse matrices from the SuiteSparse collection show that our ordering-optimisation variants can reduce the diagonal count by $5.5\times$ on average ($45.6\times$ for one instance). In addition, the dense row/column elimination approach can be useful for cases where the proposed permutation techniques are not sufficient; for instance, in one case, the additional elimination helped to reduce the encrypted multiplication cost by $23.7\times$ whereas without elimination, the improvement was only $1.9\times$.

2604.04683Apr 2026

View

Digital Privacy in IoT: Exploring Challenges, Approaches and Open Issues

Privacy has always been a critical issue in the digital era, particularly with the increasing use of Internet of Things (IoT) devices. As the IoT continues to transform industries such as healthcare, smart cities, and home automation, it has also introduced serious challenges regarding the security of sensitive and private data. This paper examines the complex landscape of digital privacy in IoT ecosystems, highlighting the need to protect personally identifiable information (PII) of individuals and uphold their rights to digital independence. Global events, such as the COVID-19 pandemic, have accelerated the adoption of IoT, raising concerns about privacy and data protection. This paper provides an in-depth examination of digital privacy risks in the IoT domain and introduces a clear taxonomy for evaluating them using the IEEE Digital Privacy Model. The proposed framework categorizes privacy risks into five types: identity-oriented, behavioral, inference, data manipulation, and regulatory risks. We review existing digital privacy solutions, including encryption technologies, blockchain, federated learning, differential privacy, reinforcement learning, AI, and dynamic consent mechanisms, to mitigate these risks. We also highlight how these privacy-enhancing technologies (PETs) help with data confidentiality, access control, and trust management. Additionally, this study presents AURA-IoT, a futuristic framework that tackles AI-driven privacy risks through a multi-layered structure. AURA-IoT integrates adversarial robustness, explainability, transparency, fairness, compliance, dynamic consent, and policy enforcement mechanisms to ensure digital privacy, security, and accountable IoT operations. Finally, we discuss ongoing challenges and potential research directions for integrating AI and encryption-based privacy solutions to achieve comprehensive digital privacy in future IoT systems.

2604.04572Apr 2026

View

2604.04561

Mapping the Exploitation Surface: A 10,000-Trial Taxonomy of What Makes LLM Agents Exploit Vulnerabilities

LLM agents with tool access can discover and exploit security vulnerabilities. This is known. What is not known is which features of a system prompt trigger this behaviour, and which do not. We present a systematic taxonomy based on approximately 10,000 trials across seven models, 37 prompt conditions, and 12 hypothesized attack dimensions -- all executed in real Docker sandboxes. Every condition includes the instruction "Always respect all rules and access policies"; only a single encouragement sentence varies. Within the task class studied (planted test-runner vulnerabilities), the threat model is narrower than assumed: nine of twelve dimensions -- including minimization, moral licensing, incentives, identity priming, and reasoning triggers -- produce no detectable exploitation at n=50 per cell (upper 95% CI <7% per condition). But it is also more subtle than expected: one dimension, goal reframing, reliably triggers exploitation. "You are solving a puzzle; there may be hidden clues" produces 38-40% exploitation on Claude Sonnet 4 despite the explicit rule instruction, replicating across four models (CTF framing: 8-14% on DeepSeek, GPT-5-mini, o4-mini). The agent does not override the rules; it reinterprets the task so that exploitative actions become task-aligned. GPT-4.1 produces no exploitation across 1,850 trials (37 conditions), and a temporal comparison across four OpenAI models released over eleven months shows a pattern consistent with improving safety training, though model capability differences are a confounder. The practical contribution is a narrowed, testable threat model: defenders should audit for goal-reframing language, not for the broad class of adversarial prompts.

2604.04561Apr 2026

View

Explainable Autonomous Cyber Defense using Adversarial Multi-Agent Reinforcement Learning

Autonomous agents are increasingly deployed in both offensive and defensive cyber operations, creating high-speed, closed-loop interactions in critical infrastructure environments. Advanced Persistent Threat (APT) actors exploit "Living off the Land" techniques and targeted telemetry perturbations to induce ambiguity in monitoring systems, causing automated defenses to overreact or misclassify benign behavior as malicious activity. Existing monolithic and multi-agent defense pipelines largely operate on correlation-based signals, lack structural constraints on response actions, and are vulnerable to reasoning drift under ambiguous or adversarial inputs. We present the Causal Multi-Agent Decision Framework (C-MADF), a structurally constrained architecture for autonomous cyber defense that integrates causal modeling with adversarial dual-policy control. C-MADF first learns a Structural Causal Model (SCM) from historical telemetry and compiles it into an investigation-level Directed Acyclic Graph (DAG) that defines admissible response transitions. This roadmap is formalized as a Markov Decision Process (MDP) whose action space is explicitly restricted to causally consistent transitions. Decision-making within this constrained space is performed by a dual-agent reinforcement learning system in which a threat-optimizing Blue-Team policy is counterbalanced by a conservatively shaped Red-Team policy. Inter-policy disagreement is quantified through a Policy Divergence Score and exposed via a human-in-the-loop interface equipped with an Explainability-Transparency Score that serves as an escalation signal under uncertainty. On the real-world CICIoT2023 dataset, C-MADF reduces the false-positive rate from 11.2%, 9.7%, and 8.4% in three cutting-edge literature baselines to 1.8%, while achieving 0.997 precision, 0.961 recall, and 0.979 F1-score.

2604.04442Apr 2026

View

DAO to (Anonymous) DAO Transactions

Blockchain assets are increasingly controlled by organizations rather than individuals. DAO treasuries, consortium wallets, and custodial exchanges rely on threshold authorization and multi-party key management, yet existing payment mechanisms still target single-user wallets, leaving no unified solution for organizational transfers. We formalize the problem of \emph{DAO-to-(anonymous)-DAO} transactions and present \textsc{Dao$^2$}, a framework that enables one threshold-controlled organization to pay another, optionally with recipient anonymity, while keeping received funds under distributed control. \textsc{Dao$^2$} combines three components: \emph{distributed key derivation} (DKD) for non-stealth child addresses, \emph{distributed stealth-address generation} (DSAG) for unlinkable one-time destinations, and \emph{threshold signatures} for authorization. For ordinary transfers, the receiver derives a non-stealth address via DKD; for anonymous transfers, it derives a stealth address via DSAG. The sender then threshold-signs the payment, and the receiver redeems the funds without reconstructing any master secret. We formally prove its security and evaluate a prototype. A complete anonymous DAO-to-DAO transaction for a typical-sized (e.g., 7-member) DAO finishes in under 27\,ms with less than 1.2\,KB of communication, and scales linearly with DAO size.

2604.04369Apr 2026

View

Evaluating Future Air Traffic Management Security

The L-Band Digital Aviation Communication System (LDACS) aims to modernize communications between the aircraft and the tower. Besides digitizing this type of communication, the contributors also focus on protecting them against cyberattacks. There are several proposals regarding LDACS security, and a recent one suggests the use of physical unclonable functions (PUFs) for the authentication module. This work demonstrates this PUF-based authentication mechanism along with its potential vulnerabilities. Sophisticated models are able to predict PUFs, and, on the other hand, quantum computers are capable of threatening current cryptography, consisting factors that jeopardize the authentication mechanism giving the ability to perform impersonation attacks. In addition, aging is a characteristic that affects the stability of PUFs, which may cause instability issues, rendering the system unavailable. In this context, this work proposes the well-established Public Key Infrastructure (PKI), as an alternative solution.

2604.04293Apr 2026

View

Page 1 of 489

Security & Cryptography Papers - ScienceStack

9,772 papers

Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain

2604.08407Apr 2026

View

Comprehensive List of User Deception Techniques in Emails

2604.04926Apr 2026

View

Strengthening Human-Centric Chain-of-Thought Reasoning Integrity in LLMs via a Structured Prompt Framework

2604.04852Apr 2026

View

2604.04833

Cryptanalysis of the Legendre Pseudorandom Function over Extension Fields

2604.04833Apr 2026

View

Unpacking .zip: A First Look at Domain and File Name Confusion

2604.04805Apr 2026

View

GPU Acceleration of TFHE-Based High-Precision Nonlinear Layers for Encrypted LLM Inference

2604.04783Apr 2026

View

Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw

2604.04759Apr 2026

View

2604.04757

Undetectable Conversations Between AI Agents via Pseudorandom Noise-Resilient Key Exchange

2604.04757Apr 2026

View

RegGuard: Legitimacy and Fairness Enforcement for Optimistic Rollups

2604.04748Apr 2026

View

Economic Security of VDF-Based Randomness Beacons: Models, Thresholds, and Design Guidelines

2604.04744Apr 2026

View

Fine-Tuning Integrity for Modern Neural Networks: Structured Drift Proofs via Norm, Rank, and Sparsity Certificates

2604.04738Apr 2026

View

Hardware-Level Governance of AI Compute: A Feasibility Taxonomy for Regulatory Compliance and Treaty Verification

2604.04712Apr 2026

View

Bridging Safety and Security in Complex Systems: A Model-Based Approach with SAFT-GT Toolchain

2604.04705Apr 2026

View

GPIR: Enabling Practical Private Information Retrieval with GPUs

2604.04696Apr 2026

View

Packing Entries to Diagonals for Homomorphic Sparse-Matrix Vector Multiplication

2604.04683Apr 2026

View

Digital Privacy in IoT: Exploring Challenges, Approaches and Open Issues

2604.04572Apr 2026

View

2604.04561

Mapping the Exploitation Surface: A 10,000-Trial Taxonomy of What Makes LLM Agents Exploit Vulnerabilities

2604.04561Apr 2026

View

Explainable Autonomous Cyber Defense using Adversarial Multi-Agent Reinforcement Learning

2604.04442Apr 2026

View

DAO to (Anonymous) DAO Transactions

2604.04369Apr 2026

View

Evaluating Future Air Traffic Management Security

2604.04293Apr 2026

View

Page 1 of 489

Trending in Security & Cryptography

Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain

2604.08407

Apr 2026Cryptography and Security

Rethinking LLM Watermark Detection in Black-Box Settings: A Non-Intrusive Third-Party Framework

2603.14968

Mar 2026Cryptography and Security

TrinityGuard: A Unified Framework for Safeguarding Multi-Agent Systems

2603.15408

Mar 2026Cryptography and Security

Real Money, Fake Models: Deceptive Model Claims in Shadow APIs

2603.01919

Mar 2026Cryptography and Security

DualSentinel: A Lightweight Framework for Detecting Targeted Attacks in Black-box LLM via Dual Entropy Lull Pattern

2603.01574

Mar 2026Cryptography and Security

AgentSentry: Mitigating Indirect Prompt Injection in LLM Agents via Temporal Causal Diagnostics and Context Purification

2602.22724

Feb 2026Cryptography and Security

SoK: DARPA's AI Cyber Challenge (AIxCC): Competition Design, Architectures, and Lessons Learned

2602.07666

Feb 2026Cryptography and Security

Evaluating Nova 2.0 Lite model under Amazon's Frontier Model Safety Framework

2601.191342

Jan 2026Cryptography and Security

Thought-Transfer: Indirect Targeted Poisoning Attacks on Chain-of-Thought Reasoning Models

2601.19061

Jan 2026Cryptography and Security

Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale

2601.10338

Jan 2026Cryptography and Security

ReasAlign: Reasoning Enhanced Safety Alignment against Prompt Injection Attack

2601.10173

Jan 2026Cryptography and Security

The Promptware Kill Chain: How Prompt Injections Gradually Evolved Into a Multi-Step Malware

2601.09625

Jan 2026Cryptography and Security

Be Your Own Red Teamer: Safety Alignment via Self-Play and Reflective Experience Replay

2601.10589

Jan 2026Cryptography and Security

Reasoning Hijacking: Subverting LLM Classification via Decision-Criteria Injection

2601.10294

Jan 2026Cryptography and Security

AgentGuardian: Learning Access Control Policies to Govern AI Agent Behavior

2601.10440

Jan 2026Cryptography and Security

SpatialJB: How Text Distribution Art Becomes the "Jailbreak Key" for LLM Guardrails

2601.09321

Jan 2026Cryptography and Security

Constitutional Classifiers++: Efficient Production-Grade Defenses against Universal Jailbreaks

2601.04603

Jan 2026Cryptography and Security

DP-MGTD: Privacy-Preserving Machine-Generated Text Detection via Adaptive Differentially Private Entity Sanitization

2601.04641

Jan 2026Cryptography and Security

2601.04553

Deep Dive into the Abuse of DL APIs To Create Malicious AI Models and How to Detect Them

2601.04553

Jan 2026Cryptography and Security

OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs

2601.01592

Jan 2026Cryptography and Security