How is this different from arXiv HTML or other HTML viewers?

Other HTML renderers (including arXiv's LaTeXML output) are fine for short papers, but on long or equation-heavy documents they can become sluggish, with pages that lag, freeze, or even crash the browser (and in many cases such papers aren't supported). Interactivity is minimal — no hover math previews, no one-click LaTeX copy, no dependency graphs. You also can't upload your own LaTeX to see it rendered in the same way. Most importantly, you're locked into viewing the paper in their format with no way to export the structured content.

ScienceStack was built to solve those gaps. Everything is compiled into structured JSON first, then rendered through a virtualized reader so even very large math or physics papers stay responsive. All blocks, equations and references are preserved with stable identifiers, which makes hover previews, exact LaTeX copy, fullscreen figures, and automated dependency graphs possible. Every paper also comes with downloadable markdown and JSON exports, so you can take the structured content into your own tools and workflows.

Is ScienceStack accessible and WCAG compliant?

Yes, ScienceStack is built to be accessible and compliant with WCAG 2.1 AA standards. Mathematical content includes proper alternative text for screen readers, interactive elements are keyboard navigable, and we maintain proper focus management throughout the interface. This ensures that researchers using assistive technologies can fully access and navigate academic papers.

Can I upload my own papers? Are my LaTeX uploads private?

Yes, you can upload your own LaTeX files through the dashboard.

Uploads are private by default, with the option to share or make public if you choose. We do not index or expose private uploads.

Where do you get the existing papers?

For arXiv papers, we fetch the original LaTeX source and transform it into structured JSON using a custom parser. This makes features like equation hover, dependency graphs, and direct LaTeX copy possible.

Why don't I see some content that appears in the arXiv PDF?

We parse the LaTeX source files that authors upload to arXiv, which may differ from the final PDF. Authors sometimes make last-minute edits directly to the PDF or use compilation settings that aren't reflected in the source code. Additionally, some visual elements or formatting may render differently between our JSON parser and arXiv's PDF generation process.

It's also possible that our JSON parser didn't capture certain elements, though we aim to display these as warnings and errors when they occur. If you find content that should be displayed but isn't, please email us at support@sciencestack.ai so we can improve our parser.

How accurate is your platform in preserving equation and section numbers?

Equation, section, caption, theorem numbering is accurate in most cases. We've tested across hundreds of ML, physics, and math papers. A few edge cases (like \mathtoolsset{showonlyrefs}) aren't yet fully supported.

What is the coverage of LaTeX macros and packages?

We support many LaTeX packages, though there are still edge cases we're actively working on. Some legacy formats (e.g. AMSTeX) and very new LaTeX features (like expl3) aren't yet supported. Unsupported macros display as \macroname in the viewer.

What does your parser not support?

Our parser doesn't support custom fonts or complex box displays, and features like \DeclareMathSymbol where new symbols are declared.

Who doesn't this work for?

If your papers are not written in LaTeX, or if you mostly skim shorter papers, you won't see much benefit.

ScienceStack is built for papers in the mathematical sciences, where structure and hovers help you keep context, and where precise, downloadable JSON keeps equations and tables intact for LLMs.



How is performance on long or heavy papers?

We built the reader with virtualization and caching, so even very long papers (200+ pages, hundreds of equations) scroll smoothly without crashing the DOM. The reader also works well on mobile devices, maintaining smooth performance across different screen sizes.

How does your markdown export compare to pandoc or other LaTeX→Markdown tools?

Most generic LaTeX→Markdown converters (like pandoc) are designed as general-purpose converters, not as faithful scientific parsers. They often produce something readable, but with major tradeoffs:

  • Section numbers: Dropped by default (unless you use --number-sections), and even then auto-generated — which may not match the author’s numbering.
  • Equation numbers: Usually lost unless hardcoded in the LaTeX source. \label{eq:foo} + \ref{eq:foo} pairs don’t become visible (3.12) numbers.
  • Cross-references: Often left as raw \ref{...} text or replaced with plain numbers, breaking “see (Eq. 2.3)” references.
  • Figures/tables: Images may render, but captions lose numbering and links.
  • Theorems/lemmas: Typically flattened into plain text with no numbering or semantic tags.
  • Complex LaTeX: Pandoc can break on custom packages, nested environments, or math-heavy documents — since it is a general-purpose “Swiss army knife,” not a dedicated LaTeX parser.

Our parser was purpose-built for research papers and is much more robust for real-world LaTeX:

  • Every section, equation, table, figure, and theorem keeps its original number.

  • All cross-references are preserved with their numbers and converted into live links.

  • Handles a lot more packages than Pandoc

  • The output is structured enough that you can reliably ask an LLM to “explain equation (3.12)” or “summarize Section 4.1” — and it will have the right context.

Which download format is best if I want to use this with AI or LLMs?

Both Markdown (.md) and JSON (.json) work great for LLM ingestion — the right choice depends on your workflow.

A) Markdown (.md) — best for direct LLM use and human readability

Our Markdown is AI-optimized and more structured than a generic LaTeX→MD conversion:

  • Every section, equation, table, and figure is numbered.
  • All cross-references are preserved with their numbers (e.g. “see (Eq. 2.3)”).
  • This makes it much easier for LLMs to answer grounded questions like
    “explain equation (3.12)” or “summarize Section 4.1” accurately.
  • It’s lightweight and token-efficient, so you can fit more of the paper into a single model context.

B) JSON (.json) — best for structured pipelines and tooling

Our JSON is fully capable of direct LLM ingestion and is still the format most of our users rely on today. It provides:

  • A clean, machine-readable hierarchy of sections, theorems, proofs, figures, and equations.
  • Stable IDs for every block — ideal for referencing, chunking, or attaching annotations.
  • Fine-grained control for RAG pipelines or filtering (e.g. “just give me the lemmas”).

See Why JSON, Not PDFs?

Recommendation

Both formats are great for LLMs. Numbers and references are preserved with proper ids, reducing the risk of LLM hallucination across large paper contexts.

  • Use Markdown if you want a fast, human-readable format for LLM chat, summarization, or embedding in your notes.
  • Use JSON if you want the full structure of the data, and/or are building automation around the paper.

Will this integrate with note-taking or reference managers?

Every paper now ships with markdown (.md) downloads by default, allowing you to export papers as clean markdown files for use in your preferred note-taking tools. The markdown export works well with VSCode, GitHub MD, Obsidian, Notion. Additionally, our markdown format adds semantic info like equation numbers, section numbers, and table/figure/theorem numbers, and auto-generates linkable references across the md file.

Direct integration into tools like Obsidian or Notion is on our list of todos.

Is this free?

The reader will always be free.

At the moment, the Pro plan ($5–10/mo) is mostly for more uploads and file sizes. We're transparent about our pricing.