Table of Contents
Fetching ...

Decades of Transformation: Evolution of the NASA Astrophysics Data System's Infrastructure

Alberto Accomazzi

TL;DR

The paper traces the ADS evolution from a single-discipline bibliographic Abstract Service to a scalable, open digital library central to NASA's open science agenda. It details a journey from a bespoke, file-based system with an in-house search capability to a cloud-based, microservices architecture built around Apache Solr, a JSON API, and a modern JavaScript UI, complemented by NLP/ML-driven metadata enrichment. Key contributions include documenting architectural inflection points, such as the 2013 Solr migration, 2015 Bumblebee UI, and 2019 feature parity, as well as outlining the SciX expansion to five disciplines and the integration of AI/ML for metadata curation and knowledge graphs. The work underscores the significance of open-source tooling, interoperability, and governance in enabling open science across disciplines while addressing trust and transparency challenges inherent to AI-based information systems.

Abstract

The NASA Astrophysics Data System (ADS) is the primary Digital Library portal for researchers in astronomy and astrophysics. Over the past 30 years, the ADS has gone from being an astronomy-focused bibliographic database to an open digital library system supporting research in space and (soon) earth sciences. This paper describes the evolution of the ADS system, its capabilities, and the technological infrastructure underpinning it. We give an overview of the ADS's original architecture, constructed primarily around simple database models. This bespoke system allowed for the efficient indexing of metadata and citations, the digitization and archival of full-text articles, and the rapid development of discipline-specific capabilities running on commodity hardware. The move towards a cloud-based microservices architecture and an open-source search engine in the late 2010s marked a significant shift, bringing full-text search capabilities, a modern API, higher uptime, more reliable data retrieval, and integration of advanced visualizations and analytics. Another crucial evolution came with the gradual and ongoing incorporation of Machine Learning and Natural Language Processing algorithms in our data pipelines. Originally used for information extraction and classification tasks, NLP and ML techniques are now being developed to improve metadata enrichment, search, notifications, and recommendations. we describe how these computational techniques are being embedded into our software infrastructure, the challenges faced, and the benefits reaped. Finally, we conclude by describing the future prospects of ADS and its ongoing expansion, discussing the challenges of managing an interdisciplinary information system in the era of AI and Open Science, where information is abundant, technology is transformative, but their trustworthiness can be elusive.

Decades of Transformation: Evolution of the NASA Astrophysics Data System's Infrastructure

TL;DR

The paper traces the ADS evolution from a single-discipline bibliographic Abstract Service to a scalable, open digital library central to NASA's open science agenda. It details a journey from a bespoke, file-based system with an in-house search capability to a cloud-based, microservices architecture built around Apache Solr, a JSON API, and a modern JavaScript UI, complemented by NLP/ML-driven metadata enrichment. Key contributions include documenting architectural inflection points, such as the 2013 Solr migration, 2015 Bumblebee UI, and 2019 feature parity, as well as outlining the SciX expansion to five disciplines and the integration of AI/ML for metadata curation and knowledge graphs. The work underscores the significance of open-source tooling, interoperability, and governance in enabling open science across disciplines while addressing trust and transparency challenges inherent to AI-based information systems.

Abstract

The NASA Astrophysics Data System (ADS) is the primary Digital Library portal for researchers in astronomy and astrophysics. Over the past 30 years, the ADS has gone from being an astronomy-focused bibliographic database to an open digital library system supporting research in space and (soon) earth sciences. This paper describes the evolution of the ADS system, its capabilities, and the technological infrastructure underpinning it. We give an overview of the ADS's original architecture, constructed primarily around simple database models. This bespoke system allowed for the efficient indexing of metadata and citations, the digitization and archival of full-text articles, and the rapid development of discipline-specific capabilities running on commodity hardware. The move towards a cloud-based microservices architecture and an open-source search engine in the late 2010s marked a significant shift, bringing full-text search capabilities, a modern API, higher uptime, more reliable data retrieval, and integration of advanced visualizations and analytics. Another crucial evolution came with the gradual and ongoing incorporation of Machine Learning and Natural Language Processing algorithms in our data pipelines. Originally used for information extraction and classification tasks, NLP and ML techniques are now being developed to improve metadata enrichment, search, notifications, and recommendations. we describe how these computational techniques are being embedded into our software infrastructure, the challenges faced, and the benefits reaped. Finally, we conclude by describing the future prospects of ADS and its ongoing expansion, discussing the challenges of managing an interdisciplinary information system in the era of AI and Open Science, where information is abundant, technology is transformative, but their trustworthiness can be elusive.
Paper Structure (7 sections, 3 figures)

This paper contains 7 sections, 3 figures.

Figures (3)

  • Figure 1: The beloved ADS abstract service query form, featuring over 100 search parameter settings in one single web page.
  • Figure 2: The ADS system architecture circa 2007. By this time the ADS system had grown into a complex set of custom-built modules and workflows.
  • Figure 3: The current (and still evolving) ADS architecture consists of cloud-based microservices, a JSON API, and javascript user interface along with and on-premises pipelines which are still integrating some existing legacy services.