Table of Contents
Fetching ...

tabulapdf: An R Package to Extract Tables from PDF Documents

Mauricio Vargas Sepúlveda, Thomas J. Leeper, Tom Paskhalis, Manuel Aristarán, Jeremy B. Merrill, Mike Tigas

TL;DR

Tabulapdf is an R package that utilizes the Tabula Java library to import tables from PDF files directly into R, enabling manual areas selection with a computer mouse for data retrieval.

Abstract

tabulapdf is an R package that utilizes the Tabula Java library to import tables from PDF files directly into R. This tool can reduce time and effort in data extraction processes in fields like investigative journalism. It allows for automatic and manual table extraction, the latter facilitated through a Shiny interface, enabling manual areas selection with a computer mouse for data retrieval.

tabulapdf: An R Package to Extract Tables from PDF Documents

TL;DR

Tabulapdf is an R package that utilizes the Tabula Java library to import tables from PDF files directly into R, enabling manual areas selection with a computer mouse for data retrieval.

Abstract

tabulapdf is an R package that utilizes the Tabula Java library to import tables from PDF files directly into R. This tool can reduce time and effort in data extraction processes in fields like investigative journalism. It allows for automatic and manual table extraction, the latter facilitated through a Shiny interface, enabling manual areas selection with a computer mouse for data retrieval.
Paper Structure (10 sections, 1 figure)