Optimizing STAR Aligner for High Throughput Computing in the Cloud

Piotr Kica; Sabina Lichołai; Michał Orzechowski; Maciej Malawski

Optimizing STAR Aligner for High Throughput Computing in the Cloud

Piotr Kica, Sabina Lichołai, Michał Orzechowski, Maciej Malawski

TL;DR

This work proposes a scalable, cloud-native architecture designed for Transcriptomics Atlas Pipeline, using a resource-intensive STAR aligner and processing tens or hundreds of terabytes of RNA-seq data, and introduces performance optimizations and experimental evaluation in the cloud.

Abstract

We propose a scalable, cloud-native architecture designed for Transcriptomics Atlas Pipeline, using a resource-intensive STAR aligner and processing tens or hundreds of terabytes of RNA-seq data. We implement the pipeline using AWS cloud services, introduce performance optimizations and perform experimental evaluation in the cloud. Our optimization techniques result in computational savings thanks to the "early stopping" approach, selection of right-sized resources, and using newer version of Ensembl genome.

Optimizing STAR Aligner for High Throughput Computing in the Cloud

TL;DR

Abstract

Paper Structure (6 sections, 4 figures)

This paper contains 6 sections, 4 figures.

Introduction
Pipeline and cloud architecture
Application-specific Optimizations
Ensembl Genome: Release 108 versus Release 111
Early stopping for STAR alignment
Conclusions and Future Work

Figures (4)

Figure 1: Transcriptomics Atlas Pipeline for STAR.
Figure 2: Cloud architecture for Transcriptomics Atlas Pipeline.
Figure 3: STAR execution time with index generated on different genome releases.
Figure 4: Time savings due to early stopping feature. Yellow bar represents unnecessary compute time.

Optimizing STAR Aligner for High Throughput Computing in the Cloud

TL;DR

Abstract

Optimizing STAR Aligner for High Throughput Computing in the Cloud

Authors

TL;DR

Abstract

Table of Contents

Figures (4)