NLP Workbench: Efficient and Extensible Integration of State-of-the-art Text Mining Tools
Peiran Yao, Matej Kosmajac, Abeer Waheed, Kostyantyn Guzhva, Natalie Hervieux, Denilson Barbosa
TL;DR
NLP Workbench addresses the barrier non-experts face when applying state-of-the-art NLP to large corpora by providing a web-based, extensible platform that unifies corpus management, text mining tools, and visualization. It relies on a containerized, microservice architecture with DAG-based pipelining and a distributed execution model, enabling efficient, parallelized computation and reuse of intermediate results via Elasticsearch indexing. Core contributions include modular, replaceable components for NER, coreference, entity linking, relation extraction, semantic parsing, summarization, sentiment analysis, and social network analysis, all accessible through REST/RPC interfaces and a browser extension. The platform supports diverse use cases in Digital Humanities, Business Analytics, and NLP Research, and is released under the MIT license to promote reproducibility and collaboration. Overall, NLP Workbench offers a scalable, user-friendly, and extensible framework that brings cutting-edge text-mining models to non-experts while enabling advanced researchers to integrate new tools and pipelines easily.
Abstract
NLP Workbench is a web-based platform for text mining that allows non-expert users to obtain semantic understanding of large-scale corpora using state-of-the-art text mining models. The platform is built upon latest pre-trained models and open source systems from academia that provide semantic analysis functionalities, including but not limited to entity linking, sentiment analysis, semantic parsing, and relation extraction. Its extensible design enables researchers and developers to smoothly replace an existing model or integrate a new one. To improve efficiency, we employ a microservice architecture that facilitates allocation of acceleration hardware and parallelization of computation. This paper presents the architecture of NLP Workbench and discusses the challenges we faced in designing it. We also discuss diverse use cases of NLP Workbench and the benefits of using it over other approaches. The platform is under active development, with its source code released under the MIT license. A website and a short video demonstrating our platform are also available.
