arXivJSON
Free

arXiv to JSON Converter

Convert any arXiv paper to structured JSON. Full document AST with one arXiv ID.

Free — just enter an arXiv ID. Plus subscription required for JSON export.

Free account required. See pricing for high-volume use.

Built for your workflow

arXiv to JSON conversion that actually works for academic papers.

LLM Pipelines

Build RAG systems over arXiv - query specific sections, equations, or citations

Paper Analysis

Extract all equations, count citations, analyze document structure programmatically

Research Tools

Build tools that understand paper structure - literature reviews, summarizers, etc.

Data Collection

Collect structured paper data for ML training or research datasets

Semantic parsing

We understand LaTeX structure, not just text. That's why our output preserves what matters.

Semantic Parsing

Understands LaTeX structure - sections, equations, theorems, figures, tables as semantic elements

Cross-Reference Resolution

Automatically resolves \ref, \cite, and other cross-references to readable formats

Macro Expansion

Expands custom macros and commands so the output is self-contained

Bibliography Support

Includes formatted references with proper numbering and citation links

Instant Conversion

Just enter the arXiv ID - we fetch and convert the source automatically

Type-Safe Schema

Every element has a type field - section, equation, figure, table, citation, etc.

See what you get

Real output from converting the “Attention Is All You Need” paper.

output.json
{
  "by": "sciencestack.ai",
  "arxivId": "1706.03762",
  "title": "Attention Is All You Need",
  "abstract": "The dominant sequence transduction models...",
  "authors": ["Ashish Vaswani", "Noam Shazeer", "..."],
  "document": [
    {
      "type": "section",
      "title": "Introduction",
      "content": [
        "Recurrent neural networks, long short-term memory \\cite{bib:1}...",
        {
          "type": "figure",
          "src": "assets/transformer.png",
          "caption": "The Transformer model architecture."
        }
      ]
    },
    {
      "type": "section",
      "title": "Attention",
      "content": [
        "An attention function maps a query and key-value pairs...",
        {
          "type": "equation",
          "content": "\\text{Attention}(Q,K,V) = \\text{softmax}(\\frac{QK^T}{\\sqrt{d_k}})V"
        }
      ]
    }
  ],
  "bibliography": [...]
}

Equations, cross-references, and structure — all preserved.

How it works

1

Enter arXiv ID

Enter an arXiv ID (e.g., 2301.07041 or 2301.07041v2)

2

Process

We parse the LaTeX semantically — understanding sections, equations, and references

3

Download

Structured JSON with document hierarchy, equations as LaTeX strings, and rich metadata

Simple pricing

Free

For arXiv conversions

  • Full LaTeX parsing
  • Equation preservation
  • Cross-reference resolution
  • Bibliography included
  • Structured document tree
Get Started

JSON export requires a Plus subscription.

Frequently Asked Questions

Ready to convert?

Enter an arXiv ID and get structured JSON in seconds.

    arXiv to JSON Converter | ScienceStack | ScienceStack