Table of Contents
Fetching ...

Sift or Get Off the PoC: Applying Information Retrieval to Vulnerability Research with SiftRank

Caleb Gross

TL;DR

Security research is constrained by resources, making vulnerability triage a critical bottleneck. SiftRank reframes this challenge as an information retrieval ranking problem and uses LLMs as general-purpose rankers with stochastic batching, inflection-based convergence, and iterative refinement to scale to thousands of candidates without specialized infrastructure. The authors introduce a formal algorithm with $O(n)$ complexity and demonstrate practical effectiveness on N-day vulnerability analysis and firmware patch identification, achieving fast inference and low cost with an open-source implementation. This work offers a scalable, zero-shot approach for prioritizing security-relevant items and suggests broad applicability to other needle-in-a-haystack ranking problems.

Abstract

Security research is fundamentally a problem of resource constraint and consequent prioritization. There is simply too much attack surface and too little time and energy to spend analyzing it all. The most effective security researchers are often those who are most skilled at intuitively deciding which part of an expansive attack surface to investigate. We demonstrate that this problem of selecting the most promising option from among many possibilities can be reframed as an information retrieval problem, and solved using document ranking techniques with LLMs performing the heavy lifting as general-purpose rankers. We present SiftRank, a ranking algorithm achieving O(n) complexity through three key mechanisms: listwise ranking using an LLM to order documents in small batches of approximately 10 items at a time; inflection-based convergence detection that adaptively terminates ranking when score distributions have stabilized; and iterative refinement that progressively focuses ranking effort on the most relevant documents. Unlike existing reranking approaches that require a separate first-stage retrieval step to narrow datasets to approximately 100 candidates, SiftRank operates directly on thousands of items, with each document evaluated across multiple randomized batches to mitigate inconsistent judgments by an LLM. We demonstrate practical effectiveness on N-day vulnerability analysis, successfully identifying a vulnerability-fixing function among 2,197 changed functions in a stripped binary firmware patch within 99 seconds at an inference cost of $0.82. Our approach enables scalable security prioritization for problems that are generally constrained by manual analysis, requiring only standard LLM API access without specialized infrastructure, embedding, or domain-specific fine-tuning. An open-source implementation of SiftRank may be found at https://github.com/noperator/siftrank.

Sift or Get Off the PoC: Applying Information Retrieval to Vulnerability Research with SiftRank

TL;DR

Security research is constrained by resources, making vulnerability triage a critical bottleneck. SiftRank reframes this challenge as an information retrieval ranking problem and uses LLMs as general-purpose rankers with stochastic batching, inflection-based convergence, and iterative refinement to scale to thousands of candidates without specialized infrastructure. The authors introduce a formal algorithm with complexity and demonstrate practical effectiveness on N-day vulnerability analysis and firmware patch identification, achieving fast inference and low cost with an open-source implementation. This work offers a scalable, zero-shot approach for prioritizing security-relevant items and suggests broad applicability to other needle-in-a-haystack ranking problems.

Abstract

Security research is fundamentally a problem of resource constraint and consequent prioritization. There is simply too much attack surface and too little time and energy to spend analyzing it all. The most effective security researchers are often those who are most skilled at intuitively deciding which part of an expansive attack surface to investigate. We demonstrate that this problem of selecting the most promising option from among many possibilities can be reframed as an information retrieval problem, and solved using document ranking techniques with LLMs performing the heavy lifting as general-purpose rankers. We present SiftRank, a ranking algorithm achieving O(n) complexity through three key mechanisms: listwise ranking using an LLM to order documents in small batches of approximately 10 items at a time; inflection-based convergence detection that adaptively terminates ranking when score distributions have stabilized; and iterative refinement that progressively focuses ranking effort on the most relevant documents. Unlike existing reranking approaches that require a separate first-stage retrieval step to narrow datasets to approximately 100 candidates, SiftRank operates directly on thousands of items, with each document evaluated across multiple randomized batches to mitigate inconsistent judgments by an LLM. We demonstrate practical effectiveness on N-day vulnerability analysis, successfully identifying a vulnerability-fixing function among 2,197 changed functions in a stripped binary firmware patch within 99 seconds at an inference cost of $0.82. Our approach enables scalable security prioritization for problems that are generally constrained by manual analysis, requiring only standard LLM API access without specialized infrastructure, embedding, or domain-specific fine-tuning. An open-source implementation of SiftRank may be found at https://github.com/noperator/siftrank.

Paper Structure

This paper contains 24 sections, 2 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: SiftRank algorithm flow showing stochastic trial loop, batch partitioning, LLM ranking operations, convergence detection, and iterative refinement. The corpus $C_k$ is randomly shuffled for each trial $t$, partitioned into $m$ batches of size $S$, and ranked by the LLM in $L$. Positions $p$ are aggregated across trials to compute scores in $R_k^{(t)}$. When the inflection point $\tau_k$ stabilizes, the corpus is partitioned at that threshold, with top candidates $C_{k+1}$ advancing to the next iteration and frozen portions $F_k$ reserved for final reassembly.
  • Figure 2: Progressive emergence of inflection point in TLD score distributions. Rows: Trials 1, 2, …, $t^*\!-\!1$, $t^*$, where $t^*$ is the trial where the position of the inflection point stabilized. Columns: Iterations 1, 2, …, $K\!-\!1$, $K$ where $K\!=\!6$. The red dotted line marks the position of the inflection point $\tau$ at convergence.
  • Figure 3: Out of 2,713 function call chains (which were then grouped into 119 function call clusters), this weighted cluster ranked at the top. It clearly shows relevance to the security advisory which mentioned "authentication" and "session cookies," each of which are mentioned in the function summaries. We are able to surface the critically relevant (but relatively lower-weight) session validation function sub_2acc210 because of its association with other higher-weight functions in the cluster.