Table of Contents
Fetching ...

DeLIAP e DeLIAJ: Interfaces de biblioteca de Dependabilidade para Python e Julia

Marcos Irigoyen, Carla Santana, Ramon C. F Araújo, Samuel Xavier-de-Souza

TL;DR

O trabalho aborda a escassez de opções de tolerância a falhas para Python e Julia em aplicações de alto desempenho. Propõe wrappers DeLIAP e DeLIAJ que estendem a biblioteca DeLIA para essas línguas, preservando capacidades de detecção, recuperação e salvamento em um modelo BSP. A eficácia é validada com um estudo de caso em Julia aplicado à inversão de onda completa 4D, observando um overhead de aproximadamente $1.4\%$ e discutindo limitações de salvamento local e atomicidade. O trabalho fornece um caminho prático para adoção de tolerância a falhas em ambientes HPC com alto desempenho, com código aberto e manutenção contínua.

Abstract

The evergrowing computational complexity of High Performance Computing applications is often met with an horizontal scalling of computing systems. Colaterally, each added node risks being a single point of failure to parallel programs, increasing the demand for fault tolerant techniques to be applied, specially at software level. Under such conditions, the fault tolerance library DeLIA was developed in C/C++ with error detection and recovery features. We propose, then, to extend the library's capabilities to Python and Julia through the wrappers DeLIAP and DeLIAJ in order to lower the barrier to entry for implementing fault-tolerant solutions in these languages, which both lack alternatives to the library. To validate the efficiency of the wrappers, an application of the Julia wrapper in the 4D Full waveform inversion method was analyzed, quantitatively assessing the introduced overhead through runtime comparisons, while an implementation report is provided to address applicability. The added computational cost reflected on a median overhead of 1.4%, while limitations in the original parallel computing module used in the application rendered local-scope data checkpointing unfeasible.

DeLIAP e DeLIAJ: Interfaces de biblioteca de Dependabilidade para Python e Julia

TL;DR

O trabalho aborda a escassez de opções de tolerância a falhas para Python e Julia em aplicações de alto desempenho. Propõe wrappers DeLIAP e DeLIAJ que estendem a biblioteca DeLIA para essas línguas, preservando capacidades de detecção, recuperação e salvamento em um modelo BSP. A eficácia é validada com um estudo de caso em Julia aplicado à inversão de onda completa 4D, observando um overhead de aproximadamente e discutindo limitações de salvamento local e atomicidade. O trabalho fornece um caminho prático para adoção de tolerância a falhas em ambientes HPC com alto desempenho, com código aberto e manutenção contínua.

Abstract

The evergrowing computational complexity of High Performance Computing applications is often met with an horizontal scalling of computing systems. Colaterally, each added node risks being a single point of failure to parallel programs, increasing the demand for fault tolerant techniques to be applied, specially at software level. Under such conditions, the fault tolerance library DeLIA was developed in C/C++ with error detection and recovery features. We propose, then, to extend the library's capabilities to Python and Julia through the wrappers DeLIAP and DeLIAJ in order to lower the barrier to entry for implementing fault-tolerant solutions in these languages, which both lack alternatives to the library. To validate the efficiency of the wrappers, an application of the Julia wrapper in the 4D Full waveform inversion method was analyzed, quantitatively assessing the introduced overhead through runtime comparisons, while an implementation report is provided to address applicability. The added computational cost reflected on a median overhead of 1.4%, while limitations in the original parallel computing module used in the application rendered local-scope data checkpointing unfeasible.

Paper Structure

This paper contains 8 sections, 3 equations, 2 figures.

Figures (2)

  • Figure 1: Box-plots das amostras coletadas.
  • Figure 2: Gráfico de dispersão das amostras coletadas.