Optimizing STAR Aligner for High Throughput Computing in the Cloud
Piotr Kica, Sabina Lichołai, Michał Orzechowski, Maciej Malawski
TL;DR
This work proposes a scalable, cloud-native architecture designed for Transcriptomics Atlas Pipeline, using a resource-intensive STAR aligner and processing tens or hundreds of terabytes of RNA-seq data, and introduces performance optimizations and experimental evaluation in the cloud.
Abstract
We propose a scalable, cloud-native architecture designed for Transcriptomics Atlas Pipeline, using a resource-intensive STAR aligner and processing tens or hundreds of terabytes of RNA-seq data. We implement the pipeline using AWS cloud services, introduce performance optimizations and perform experimental evaluation in the cloud. Our optimization techniques result in computational savings thanks to the "early stopping" approach, selection of right-sized resources, and using newer version of Ensembl genome.
