Table of Contents
Fetching ...

An Interactive Metrics Dashboard for the Keck Observatory Archive

G. Bruce Berriman, Min Phone Myat Zaw

TL;DR

KOA's rapid data growth and near-real-time ingestion demanded an integrated metrics solution. The authors implement a Python-based Plotly-Dash dashboard built atop a VO-compliant nexsciTAP-backed query layer with SQLAlchemy and Jinja2 to deliver live metrics updates at a cadence of about every 5–7 seconds. A key contribution is the dual-data architecture that uses precomputed static tables for historical data and dynamic current-year data, together with parallel queries, to meet the seven-second update target and achieve substantial performance gains (roughly 20x). The dashboard not only monitors instantaneous ingestion and archive growth but also provides a scalable foundation for expanding metrics such as data volume, query counts, and downloads. Collectively, the work enables real-time observability of KOA and informs hardware and software planning for future instruments and data-deluge scenarios like Rubin Observatory alerts.

Abstract

Since 2004, the Keck Observatory Archive (KOA) has operated as a NASA-funded collaboration between the NASA Exoplanet Science Institute ( NExScI) and the W.M. Keck Observatory. It ingests and serves all data acquired by the twin 10-meter Keck telescopes on Mauna Kea, Hawaii. In the past three years, KOA has begun a modernization program to replace the architecture and systems used since the archive's creation with a new modern Python-based infrastructure. This infrastructure will position KOA to respond to the rapid growth of new and complex data sets that will be acquired by new instruments now in development, and enable follow-up to identify the deluge of alerts of transient sources expected by new survey telescopes such as the Vera C. Rubin Observatory. Since 2022, KOA has ingested new data in near-real time, generally within one minute of creation, and has made them immediately accessible to observers through a dedicated web interface. The archive is now deploying a new, scalable, Python-based, VO-compliant query infrastructure built with the Plotly-Dash framework and R-tree indices to speed-up queries by a factor of 20. The project described here exploits the new query infrastructure to develop a dashboard that will return live metrics on the performance and growth of the archive. These metrics assess the current health of the archive and guide planning future hardware and software upgrades. This single dashboard will enable, for example, monitoring of real-time ingestion, as well as studying the long-term growth of the archive. Current methods of gathering metrics that have been in place since the archive opened will not support the archive as it continues to scale. These methods suffer from high latency, are not optimized for on-demand metrics, are scattered among various tools, and are cumbersome to use.

An Interactive Metrics Dashboard for the Keck Observatory Archive

TL;DR

KOA's rapid data growth and near-real-time ingestion demanded an integrated metrics solution. The authors implement a Python-based Plotly-Dash dashboard built atop a VO-compliant nexsciTAP-backed query layer with SQLAlchemy and Jinja2 to deliver live metrics updates at a cadence of about every 5–7 seconds. A key contribution is the dual-data architecture that uses precomputed static tables for historical data and dynamic current-year data, together with parallel queries, to meet the seven-second update target and achieve substantial performance gains (roughly 20x). The dashboard not only monitors instantaneous ingestion and archive growth but also provides a scalable foundation for expanding metrics such as data volume, query counts, and downloads. Collectively, the work enables real-time observability of KOA and informs hardware and software planning for future instruments and data-deluge scenarios like Rubin Observatory alerts.

Abstract

Since 2004, the Keck Observatory Archive (KOA) has operated as a NASA-funded collaboration between the NASA Exoplanet Science Institute ( NExScI) and the W.M. Keck Observatory. It ingests and serves all data acquired by the twin 10-meter Keck telescopes on Mauna Kea, Hawaii. In the past three years, KOA has begun a modernization program to replace the architecture and systems used since the archive's creation with a new modern Python-based infrastructure. This infrastructure will position KOA to respond to the rapid growth of new and complex data sets that will be acquired by new instruments now in development, and enable follow-up to identify the deluge of alerts of transient sources expected by new survey telescopes such as the Vera C. Rubin Observatory. Since 2022, KOA has ingested new data in near-real time, generally within one minute of creation, and has made them immediately accessible to observers through a dedicated web interface. The archive is now deploying a new, scalable, Python-based, VO-compliant query infrastructure built with the Plotly-Dash framework and R-tree indices to speed-up queries by a factor of 20. The project described here exploits the new query infrastructure to develop a dashboard that will return live metrics on the performance and growth of the archive. These metrics assess the current health of the archive and guide planning future hardware and software upgrades. This single dashboard will enable, for example, monitoring of real-time ingestion, as well as studying the long-term growth of the archive. Current methods of gathering metrics that have been in place since the archive opened will not support the archive as it continues to scale. These methods suffer from high latency, are not optimized for on-demand metrics, are scattered among various tools, and are cumbersome to use.
Paper Structure (7 sections)