Table of Contents
Fetching ...

PseudoSeer: a Search Engine for Pseudocode

Levent Toksoz, Mukund Srinath, Gang Tan, C. Lee Giles

TL;DR

A novel pseudocode search engine is designed to facilitate efficient retrieval and search of academic papers containing pseudocode, leveraging Elasticsearch and supporting advanced features like combined facet searches and exact-match queries for more targeted results.

Abstract

A novel pseudocode search engine is designed to facilitate efficient retrieval and search of academic papers containing pseudocode. By leveraging Elasticsearch, the system enables users to search across various facets of a paper, such as the title, abstract, author information, and LaTeX code snippets, while supporting advanced features like combined facet searches and exact-match queries for more targeted results. A description of the data acquisition process is provided, with arXiv as the primary data source, along with methods for data extraction and text-based indexing, highlighting how different data elements are stored and optimized for search. A weighted BM25-based ranking algorithm is used by the search engine, and factors considered when prioritizing search results for both single and combined facet searches are described. We explain how each facet is weighted in a combined search. Several search engine results pages are displayed. Finally, there is a brief overview of future work and potential evaluation methodology for assessing the effectiveness and performance of the search engine is described.

PseudoSeer: a Search Engine for Pseudocode

TL;DR

A novel pseudocode search engine is designed to facilitate efficient retrieval and search of academic papers containing pseudocode, leveraging Elasticsearch and supporting advanced features like combined facet searches and exact-match queries for more targeted results.

Abstract

A novel pseudocode search engine is designed to facilitate efficient retrieval and search of academic papers containing pseudocode. By leveraging Elasticsearch, the system enables users to search across various facets of a paper, such as the title, abstract, author information, and LaTeX code snippets, while supporting advanced features like combined facet searches and exact-match queries for more targeted results. A description of the data acquisition process is provided, with arXiv as the primary data source, along with methods for data extraction and text-based indexing, highlighting how different data elements are stored and optimized for search. A weighted BM25-based ranking algorithm is used by the search engine, and factors considered when prioritizing search results for both single and combined facet searches are described. We explain how each facet is weighted in a combined search. Several search engine results pages are displayed. Finally, there is a brief overview of future work and potential evaluation methodology for assessing the effectiveness and performance of the search engine is described.

Paper Structure

This paper contains 11 sections, 6 figures.

Figures (6)

  • Figure 1: Results page obtained by using the search query 'nearest neighbor' in the abstract field
  • Figure 2: Landing page with options to search in LaTeX, references, title, abstract, and authors, individually or combined.
  • Figure 3: Example pseudocode with for and if statements.
  • Figure 4: Single-field search using a search query 'tree' and the search field LaTeX
  • Figure 5: Combined search using the search query 'LLM' across the title and abstract fields
  • ...and 1 more figures