Table of Contents
Fetching ...

The Pan-STARRS1 Database and Data Products

H. A. Flewelling, E. A. Magnier, K. C. Chambers, J. N. Heasley, C. Holmberg, M. E. Huber, W. Sweeney, C. Z. Waters, A. Calamida, S. Casertano, X. Chen, D. Farrow, G. Hasinger, R. Henderson, K. S. Long, N. Metcalfe, G. Narayan, M. A. Nieto-Santisteban, P. Norberg, A. Rest, R. P. Saglia, A. Szalay, A. R. Thakar, J. L. Tonry, J. Valenti, S. Werner, R. White, L. Denneau, P. W. Draper, K. W. Hodapp, R. Jedicke, N. Kaiser, R. P. Kudritzki, P. A. Price, R. J. Wainscoat, P. S. Builders, S. Chastel, B. McLean, M. Postman, B. Shiao

TL;DR

The paper details the Pan-STARRS1 database, its catalog products, and the architecture that underpins public access to PSPS data. It explains the data flow from IPP processing through the Desktop Virtual Observatory to the PSPS relational schema, highlighting data provenance, calibration (including Gaia-based alignment), and batch-driven ingestion. It documents data releases DR1 and DR2, the PSPS software stack (DXLayer, ODM, WMD, DRL, PSI), and the partitioned, scalable design that supports complex joins across 50+ tables. The work demonstrates the scale and utility of PS1 data for static-sky and time-domain science, and outlines how the system supports future large surveys such as LSST, enabling broad, high-fidelity astrophysical studies.

Abstract

This paper describes the organization of the database and the catalog data products from the Pan-STARRS1 $3π$ Steradian Survey. The catalog data products are available in the form of an SQL-based relational database from MAST, the Mikulski Archive for Space Telescopes at STScI. The database is described in detail, including the construction of the database, the provenance of the data, the schema, and how the database tables are related. Examples of queries for a range of science goals are included. The catalog data products are available in the form of an SQL-based relational database from MAST, the Mikulski Archive for Space Telescopes at STScI.

The Pan-STARRS1 Database and Data Products

TL;DR

The paper details the Pan-STARRS1 database, its catalog products, and the architecture that underpins public access to PSPS data. It explains the data flow from IPP processing through the Desktop Virtual Observatory to the PSPS relational schema, highlighting data provenance, calibration (including Gaia-based alignment), and batch-driven ingestion. It documents data releases DR1 and DR2, the PSPS software stack (DXLayer, ODM, WMD, DRL, PSI), and the partitioned, scalable design that supports complex joins across 50+ tables. The work demonstrates the scale and utility of PS1 data for static-sky and time-domain science, and outlines how the system supports future large surveys such as LSST, enabling broad, high-fidelity astrophysical studies.

Abstract

This paper describes the organization of the database and the catalog data products from the Pan-STARRS1 Steradian Survey. The catalog data products are available in the form of an SQL-based relational database from MAST, the Mikulski Archive for Space Telescopes at STScI. The database is described in detail, including the construction of the database, the provenance of the data, the schema, and how the database tables are related. Examples of queries for a range of science goals are included. The catalog data products are available in the form of an SQL-based relational database from MAST, the Mikulski Archive for Space Telescopes at STScI.

Paper Structure

This paper contains 59 sections, 15 figures, 59 tables.

Figures (15)

  • Figure 1: An overview of the steps necessary to create publicly accessible Pan-STARRS1 data. The first step is to take exposures from the summit, process them via the image processing pipeline (IPP), ingest the data into the PSPS, and then provide public access to the user. The IPP has many steps of processing, not all are shown here. The camera, stacks, difference images and forced photometry stages produce binary catalog FITS files which are the foundation of building the DVO database, which is then calibrated. The final step of IPP processing is to use IppToPsps to generate small batches of data in the appropriate database schema to be ingested into PSPS. This paper primarily focuses on the PSPS and the database schema. The other steps are explained in enough detail to describe known and potential sources of inconsistencies within the database.
  • Figure 2: This figure shows a flowchart of how data flows from the IPP side into batches for PSPS, using IppToPsps. On the IPP side, the DVO database shows cpt/cpm/cps/cpx/cpy/cpq files, organized and grouped by which IppToPsps batch type uses them. The IPP side also has the smf/cmf files from the camera stage, forced warp stage, and stack (skycal) stages, these smf/cmf files are also needed for IppToPsps. IppToPsps has several different batch types, extracting data from different sources, and generating batches for ingest into PSPS. Batches related to diffs are not shown here, it is a similar process (cpt,cpm) files from the diff DVO and cmf files from the diff skycells go through IppToPsps to create DF batches (analagous to P2 or ST but using diff cmfs). DO batches are created using cpt,cps files from the diff DVO (similar to how OB or GO batches are created).
  • Figure 3: This figure shows a flowchart of how data flows from the IPP (via IppToPsps) into the load merge machines, which is then copied to the slice machines to allow for users to query the data (via a modified CasJobs)
  • Figure 4: This shows how the data (L1 data/csv files/Image Pipeline) is loaded into L2 data (the load merge machines - responsible for loading the data and merging it into the 'cold' part of the database. In this figure there are 8 slice machines which hold hot and warm copies of the database. At the bottom is the head nodes and the main database. The hot database serves the fast response queue and the warm database serves the slow queue. The fast queue is specifically for queries that take less than one minute to complete. The cold database is never accessible by users.
  • Figure 5: A flowchart of the DXLayer process, showing how batches are loaded into the DXLayer, verified, and submitted to the ODM. The shaded rectangles refer to different systems, and the white boxes and white cylinder refer to difference steps for the systems.
  • ...and 10 more figures