API Beta · 150k+ arXiv papers

Scientific Papers as APIs,
not documents

We parse scientific papers so you don't have to.
Don't waste compute on PDFs — focus compute on science.

Parsed from LaTeX source. No OCR. No hallucinations. 100ms latency.

Get API Key View Docs

Query papers like code

See preview responses from "Attention Is All You Need" (1706.03762v7)

Paper: 1706.03762v7

Click any node

GET/api/v1/papers/1706.03762v7/nodes?nodeId=sec:1&format=markdown

<a id="sec-1"></a>

## 1: Introduction
---

Recurrent neural networks, long short-term memory [[hochreiter1997]](#bib-hochreiter1997) and gated recurrent [[gruEval14]](#bib-gruEval14) neural networks in particular, have been firmly established as state of the art approaches in sequence modeling and transduction problems such as language modeling and machine translation [[sutskever14]](#bib-sutskever14), [[bahdanau2014neural]](#bib-bahdanau2014neural), [[cho2014learning]](#bib-cho2014learning). Numerous efforts have since continued to push the boundaries of recurrent language models and encoder-decoder architectures [[wu2016google]](#bib-wu2016google), [[luong2015effective]](#bib-luong2015effective), [[jozefowicz2016exploring]](#bib-jozefowicz2016exploring).

Recurrent models typically factor computation along the symbol positions of the input and output sequences. Aligning the positions to steps in computation time, they generate a sequence of hidden states  $h_t$ , as 
...

Every node contains full content from LaTeX source.

Available as Markdown, LaTeX, and JSON.

The core problem

PDFs weren't built for machines

PDF extraction gives you strings and glyphs. Instead, we parse LaTeX to give you a semantic graph with stable IDs, relationships, and metadata.

✗PDF extraction output

PyMuPDF / GROBID / Nougat / LLMs

We call our particular attention "Scaled Dot-Product Attention" (Figure 2). The input consists of queries and keys of dimension dk, and values of dimension dv.

Attention(Q, K, V ) = softmax( QKT √dk )V

The two most commonly used attention functions are additive attention [2], and dot-product attention. Dot-product attention is identical to our algorithm, except for the scaling factor.

✗ No way to find "equation 1"

✗ No link to Figure 2

✗ Can't extract just this section

✗ No parent/child relationships

✓ScienceStack API

GET /api/v1/papers/1706.03762v7/nodes?nodeId=sec:3.2.1&format=markdown

## 3.2.1: Scaled Dot-Product Attention

We call our particular attention "Scaled Dot-Product Attention" (Figure [fig:2](#fig-2)). The input consists of queries and keys of dimension $d_k$, and values of dimension $d_v$.

$$
\mathrm{Attention}(Q,K,V) = \mathrm{softmax}(\frac{QK^T}{\sqrt{d_k}})V \tag{1}
$$

The two most commonly used attention functions are additive attention [\[bahdanau2014\]](#bib-bahdanau2014), and dot-product attention. Dot-product attention is identical to our algorithm, except for the scaling factor.

✓ Same section, any format you need

✓ Query any node type (equation, figure, table, etc.)

✓ Figures have CDN URLs

✓ Stable nodeIds and parent/child relationships

We preserve structure & math

PDF extraction is lossy, LaTeX is not

We parse every paper directly from LaTeX source — no OCR, no prediction, no hallucination.

PyMuPDF / pdfplumbertext extraction

✗Output

~2s

Attention(Q, K, V ) = softmax( QKT √dk )V

✓ScienceStack

~100ms

\mathrm{Attention}(Q,K,V) = \mathrm{softmax}(\frac{QK^T}{\sqrt{d_k}})V

Raw glyph extraction. No structure, no LaTeX — just Unicode characters.

GROBIDML-based parser

✗Output

~5-10s

<formula>Attention(Q,K,V) = softmax(QK^T/sqrt(d_k))V</formula>

✓ScienceStack

~100ms

\mathrm{Attention}(Q,K,V) = \mathrm{softmax}(\frac{QK^T}{\sqrt{d_k}})V

Structured XML, but math is not reliably recoverable as LaTeX.

Nougat / Marker / LLMsneural OCR

✗Output

~30-60s

\nabla_\theta \mathcal{L} = \frac{1}{N}\sum_{i=1}^{N} \frac{\partial \ell_i}{\partial \theta_j}

✓ScienceStack

~100ms

\nabla_\theta \mathcal{L} = \frac{1}{N}\sum_{i=1}^{N} \frac{\partial \ell_i}{\partial \theta}

Hallucinated subscript: θ_j vs θ. Predicted, non-deterministic, high latency — and costs tokens per call.

Try it yourself

Preview responses from "Attention Is All You Need" (1706.03762v7).
Endpoints available in Markdown, LaTeX, and JSON.

Free

Quota (resets monthly)

GET /api/v1/papers/1706.03762v7/nodes?nodeId=sec:3.2.1&format=markdown

Any section, equation, figure, or table — in your preferred format

#### 3.2.1: Scaled Dot-Product Attention

We call our particular attention "Scaled Dot-Product Attention" (Figure [fig:multi-head-att](#fig:multi-head-att)). The input consists of queries and keys of dimension $d_k$, and values of dimension $d_v$.

$$
\mathrm{Attention}(Q, K, V) = \mathrm{softmax}(\frac{QK^T}{\sqrt{d_k}})V \tag{1}
$$

The two most commonly used attention functions are additive attention [\[bahdanau2014neural\]](#bib-bahdanau2014neural), and dot-product (multiplicative) attention...

View in reader ·Full API docs

Get API Key

API at a glance

Every endpoint you need to build with scientific papers.

Endpoint	Returns	Free
`/search` Find papers by topic, author, arXiv ID	Find papers by topic, author, arXiv ID	FREE
`/papers` Browse papers by category or field e.g. "machine-learning", "cs.CV"	Browse papers by category or field e.g. "machine-learning", "cs.CV"	FREE
`/papers/{id}/overview` TOC, figure/equation/table refs, AI summaries	TOC, figure/equation/table refs, AI summaries	FREE
`/papers/{id}/figures` Image URLs for vision models	Image URLs for vision models	quota
`/papers/{id}/nodes?types=equation` Filter by type: equation, table, math_env, algorithm	Filter by type: equation, table, math_env, algorithm	quota
`/papers/{id}/nodes?nodeId=sec:3.2.1` Access any node with stable IDs	Access any node with stable IDs	quota
`/papers/{id}/content` Full paper as Markdown, LaTeX, or text	Full paper as Markdown, LaTeX, or text	quota
`/papers/{id}/references` Bibliography with Semantic Scholar enrichment	Bibliography with Semantic Scholar enrichment	quota

Once you access a paper, you get unlimited requests to that paper for the rest of the month. Learn more

Optimized for scientific agent workflows

Browse free, deep dive once, use unlimited. Ideal for fully autonomous agents.

1FREE

Search

Find papers by topic or arXiv ID

2FREE

Overview

Get TOC, figures, equations, AI summaries via /overview

3QUOTA

Deep Dive

Access full content per paper (uses 1 paper quota)

4NO EXTRA COST

Unlimited

Fetch that paper's equations, figures, nodes for the rest of the month

See full agent walkthrough

How do I know the data is good?

Every paper in our API comes with an interactive reader , as proof that our parsing works.

Featured: Language Models are Injective and Hence Invertible (2510.15511)

The reader is a live demonstration of our API data. Hover any citation, equation, or figure — that's the same structured data you get via API.

Hover citations & equations for instant previews
Dependency graphs showing how concepts connect
Annotations that sync across devices
Export to PDF, LaTeX, Markdown, or JSON
Dark mode & mobile-friendly

Learn more about the reader

Build Tools and Copilots with ScienceStack

Query any section, equation, figure, or citation — as Markdown/LaTeX/JSON, full metadata, and parent/child relationships.

150k+

Papers indexed

<100ms

Avg response time

99.9%

Uptime

Stable API

Scientific Copilots

Build AI tools that understand papers like researchers do — cite specific equations, reference exact figures.

"Explain equation 3 from the attention paper"

Citation-Aware RAG

Ground every answer in verifiable sources. Stable node IDs (eq:3, fig:2) enable precise attribution.

Link answers directly to paper sections

Bulk Paper Analysis

Extract all equations from 100 transformer papers in minutes. Compare methods systematically.

No PDFs. No OCR. Just data.

Knowledge Graphs

Build citation networks from structured bibliographies. References linked to arXiv IDs and DOIs.

Map how papers connect

Used by researchers and developers at

Simple, usage-based pricing

Start free. Pay only when you need more. /search and /overview are always free.

Recommended

Free

Try it out

Explore the API. Search papers and access structured content.

Start Free

Unlimited /search queries
/overview for any paper
Markdown, LaTeX, text export
10 papers/month deep access

Recommended

Pro

For researchers & developers

$29/mo

AI summaries, per section summaries, and node-level AST.

Everything in Free
AI summaries (paper + section)
Node-scoped AST export
200 papers/month, $0.10 overage

Recommended

Team

For scientific tools & pipelines

$99/mo

Full paper AST for RAG and agent workflows.

Everything in Pro
Full paper AST export
Higher rate limits
1,000 papers/month, $0.05 overage

See full pricing details

Ready to build?

Get your API key and create amazing tools with ScienceStack.

Get API Key Read the Docs

Scientific Papers as APIs, not documents

Query papers like code

Paper: 1706.03762v7

The core problem

PDFs weren't built for machines

We preserve structure & math

PDF extraction is lossy, LaTeX is not

Try it yourself

API at a glance

Optimized for scientific agent workflows

Search

Overview

Deep Dive

Unlimited

How do I know the data is good?

Build Tools and Copilots with ScienceStack

Scientific Copilots

Citation-Aware RAG

Bulk Paper Analysis

Knowledge Graphs

Simple, usage-based pricing

Free

Pro

Team

Ready to build?

Scientific Papers as APIs,
not documents