Research

Exports

Scientific papers are typically distributed as PDFs — convenient for humans, but terrible for machines, AI, and modern workflows.

ScienceStack transforms LaTeX source into three structured export formats that preserve the full semantic content of research papers:

  • Markdown (.md) — Human-readable, works with Obsidian/Notion/VSCode, preserves all numbering
  • JSON (.json) — Machine-native, optimized for LLMs and AI pipelines
  • LaTeX (.tex) — Raw LaTeX with all macros expanded

All formats preserve equations, section numbers, cross-references, and document structure — making them superior to PDF extraction or generic converters.


The PDF Problem

PDFs flatten rich document structure into visual layouts, stripping away semantic meaning:

  • Loss of structure — Sections, figures, theorems, and references are mashed into a page dump
  • Broken math — Equations are often extracted incorrectly (superscripts and fractions collapse)
  • No semantic cues — Citations appear as [12] instead of links to actual references
  • Bad for AI — LLMs waste tokens on noise (line breaks, formatting artifacts)

How to Export

  1. Navigate to any paper on ScienceStack
  2. Click the Download button in the top-right navigation bar
  3. Select your preferred format from the dropdown
  4. Configure options (annotations, assets) and download

Markdown Export

Our Markdown export is purpose-built for research papers and significantly more robust than generic LaTeX→Markdown converters.

Key Features

  • Complete numbering preservation — Sections, equations, figures, tables, and theorems all keep their original numbers
  • Linkable cross-references — All \ref{...} commands become live markdown links
  • Complete asset package (Pro) — Download with all figures and diagrams in optimized formats
  • LLM-friendly annotations — Your notes are embedded as structured JSON in HTML comments
  • Works everywhere — Compatible with Obsidian, Notion, VSCode, GitHub

JSON Export

Our JSON format is machine-native and optimized for AI applications, LLM ingestion, and programmatic analysis.

Why JSON Over PDFs for LLMs?

ProblemPDFOur JSON
Math extractionCorruptedLaTeX preserved
StructureFlattenedFull semantic tree
NumberingOCR errorsAll elements numbered

Key Properties

  • Macros expanded — All \newcommand definitions resolved
  • Stable IDs — Every block has a unique identifier
  • Semantic types — Explicit tags for abstracts, proofs, definitions, etc.
  • Resolved references\ref{thm:main} links to the actual theorem block

LaTeX Export

Download the raw LaTeX source with all macros expanded and content in the correct order.

What You Get

  • Macro expansion — All \newcommand, \def, and custom commands resolved
  • Complete content — All \input and \include files merged in order
  • Clean formatting — Unnecessary whitespace and comments removed
  • Bibliography included — References appended as BibTeX entries
    Exports