Sift or Get Off the PoC: Applying Information Retrieval to Vulnerability Research with SiftRank

Caleb Gross

Sift or Get Off the PoC: Applying Information Retrieval to Vulnerability Research with SiftRank

Caleb Gross

TL;DR

Security research is constrained by resources, making vulnerability triage a critical bottleneck. SiftRank reframes this challenge as an information retrieval ranking problem and uses LLMs as general-purpose rankers with stochastic batching, inflection-based convergence, and iterative refinement to scale to thousands of candidates without specialized infrastructure. The authors introduce a formal algorithm with $O(n)$ complexity and demonstrate practical effectiveness on N-day vulnerability analysis and firmware patch identification, achieving fast inference and low cost with an open-source implementation. This work offers a scalable, zero-shot approach for prioritizing security-relevant items and suggests broad applicability to other needle-in-a-haystack ranking problems.

Abstract

Security research is fundamentally a problem of resource constraint and consequent prioritization. There is simply too much attack surface and too little time and energy to spend analyzing it all. The most effective security researchers are often those who are most skilled at intuitively deciding which part of an expansive attack surface to investigate. We demonstrate that this problem of selecting the most promising option from among many possibilities can be reframed as an information retrieval problem, and solved using document ranking techniques with LLMs performing the heavy lifting as general-purpose rankers. We present SiftRank, a ranking algorithm achieving O(n) complexity through three key mechanisms: listwise ranking using an LLM to order documents in small batches of approximately 10 items at a time; inflection-based convergence detection that adaptively terminates ranking when score distributions have stabilized; and iterative refinement that progressively focuses ranking effort on the most relevant documents. Unlike existing reranking approaches that require a separate first-stage retrieval step to narrow datasets to approximately 100 candidates, SiftRank operates directly on thousands of items, with each document evaluated across multiple randomized batches to mitigate inconsistent judgments by an LLM. We demonstrate practical effectiveness on N-day vulnerability analysis, successfully identifying a vulnerability-fixing function among 2,197 changed functions in a stripped binary firmware patch within 99 seconds at an inference cost of $0.82. Our approach enables scalable security prioritization for problems that are generally constrained by manual analysis, requiring only standard LLM API access without specialized infrastructure, embedding, or domain-specific fine-tuning. An open-source implementation of SiftRank may be found at https://github.com/noperator/siftrank.

Sift or Get Off the PoC: Applying Information Retrieval to Vulnerability Research with SiftRank

TL;DR

Abstract

Sift or Get Off the PoC: Applying Information Retrieval to Vulnerability Research with SiftRank

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)