Logic Mill -- A Knowledge Navigation System

Sebastian Erhardt; Mainak Ghosh; Erik Buunk; Michael E. Rose; Dietmar Harhoff

Logic Mill -- A Knowledge Navigation System

Sebastian Erhardt, Mainak Ghosh, Erik Buunk, Michael E. Rose, Dietmar Harhoff

TL;DR

Logic Mill presents a scalable, open-access knowledge navigation system that encodes vast science and patent corpora into dense embeddings to enable rapid cross-domain retrieval and similarity analysis. It centers on the SPECTER encoder (built on SciBERT) to produce $768$-dimensional vectors from title and abstract inputs within a $512$-token limit, and stores these embeddings in ElasticSearch using $HNSW$-based ANN for milliseconds-precision nearest-neighbor search. The architecture is implemented as microservices with a Go backend and a GraphQL API, and supports continuous ingestion from Semantic Scholar, EPO, USPTO, and WIPO, along with user-supplied documents for cross-domain linking and exploration. The system is designed for literature exploration, prior-art searches, and cross-domain knowledge tracing, with future plans to broaden corpora (e.g., Wikipedia) and encoders, thereby enhancing research workflows and knowledge transfer across domains.

Abstract

Logic Mill is a scalable and openly accessible software system that identifies semantically similar documents within either one domain-specific corpus or multi-domain corpora. It uses advanced Natural Language Processing (NLP) techniques to generate numerical representations of documents. Currently it leverages a large pre-trained language model to generate these document representations. The system focuses on scientific publications and patent documents and contains more than 200 million documents. It is easily accessible via a simple Application Programming Interface (API) or via a web interface. Moreover, it is continuously being updated and can be extended to text corpora from other domains. We see this system as a general-purpose tool for future research applications in the social sciences and other domains.

Logic Mill -- A Knowledge Navigation System

TL;DR

-dimensional vectors from title and abstract inputs within a

-token limit, and stores these embeddings in ElasticSearch using

-based ANN for milliseconds-precision nearest-neighbor search. The architecture is implemented as microservices with a Go backend and a GraphQL API, and supports continuous ingestion from Semantic Scholar, EPO, USPTO, and WIPO, along with user-supplied documents for cross-domain linking and exploration. The system is designed for literature exploration, prior-art searches, and cross-domain knowledge tracing, with future plans to broaden corpora (e.g., Wikipedia) and encoders, thereby enhancing research workflows and knowledge transfer across domains.

Abstract

Paper Structure (35 sections, 3 equations, 3 figures, 1 table)

This paper contains 35 sections, 3 equations, 3 figures, 1 table.

Introduction
Document Encoding
Bag of words
Word embeddings
Sentence/Paragraph Embeddings
BERT Language Model
SciBERT
SPECTER
System specifics
External Data Sources
Extract, Transform, Load
Document Encoder
Storage & Search
Computation & API
User Interfaces
...and 20 more sections

Figures (3)

Figure 1: Logic Mill Architecture Overview
Figure 2: Logic Mill Website - Overview
Figure 3: Logic Mill Website - Example Query

Logic Mill -- A Knowledge Navigation System

TL;DR

Abstract

Logic Mill -- A Knowledge Navigation System

Authors

TL;DR

Abstract

Table of Contents

Figures (3)