Table of Contents
Fetching ...

Popping Bubbles in Pangenome Graphs

Njagi Mwaniki, Erik Garrison, Nadia Pisanti

TL;DR

It is shown that povu can find flubbles and also output the flubble tree while being as fast (or faster than) well established tools that find bubbles, such as vg and BubbleGun.

Abstract

In this paper, we introduce flubbles, a new definition of "bubbles" corresponding to variants in a (pan)genome graph $G$. We then show a characterization for flubbles in terms of equivalence classes regarding cycles in an intermediate data structure we built from the spanning tree of the $G$, which leads us to a linear time and space solution for finding all flubbles. Furthermore, we show how a related characterization also allows us to efficiently detect what we define as hairpin inversions: a cycle preceded and followed by the same path in the graph; being the latter necessarily traversed both ways, this structure corresponds to inversions. Finally, Inspired by the concept of Program Structure Tree introduced fifty years ago to represent the hierarchy of the control structure of a program, we define a tree representing the structure of G in terms of flubbles, the flubble tree, which we also find in linear time. The hierarchy of variants introduced by the flubble tree paves the way for new investigations of (pan)genomic structures and their decomposition for practical analyses. We have implemented our methods into a prototype tool named povu which we tested on human and yeast data. We show that povu can find flubbles and also output the flubble tree while being as fast (or faster than) well established tools that find bubbles, such as vg and BubbleGun. Moreover, we show how, within the same time, povu can find hairpin inversions that, to the best of our knowledge, no other tool is able to find. Our tool is freely available at https://github.com/urbanslug/povu/ under the MIT License.

Popping Bubbles in Pangenome Graphs

TL;DR

It is shown that povu can find flubbles and also output the flubble tree while being as fast (or faster than) well established tools that find bubbles, such as vg and BubbleGun.

Abstract

In this paper, we introduce flubbles, a new definition of "bubbles" corresponding to variants in a (pan)genome graph . We then show a characterization for flubbles in terms of equivalence classes regarding cycles in an intermediate data structure we built from the spanning tree of the , which leads us to a linear time and space solution for finding all flubbles. Furthermore, we show how a related characterization also allows us to efficiently detect what we define as hairpin inversions: a cycle preceded and followed by the same path in the graph; being the latter necessarily traversed both ways, this structure corresponds to inversions. Finally, Inspired by the concept of Program Structure Tree introduced fifty years ago to represent the hierarchy of the control structure of a program, we define a tree representing the structure of G in terms of flubbles, the flubble tree, which we also find in linear time. The hierarchy of variants introduced by the flubble tree paves the way for new investigations of (pan)genomic structures and their decomposition for practical analyses. We have implemented our methods into a prototype tool named povu which we tested on human and yeast data. We show that povu can find flubbles and also output the flubble tree while being as fast (or faster than) well established tools that find bubbles, such as vg and BubbleGun. Moreover, we show how, within the same time, povu can find hairpin inversions that, to the best of our knowledge, no other tool is able to find. Our tool is freely available at https://github.com/urbanslug/povu/ under the MIT License.

Paper Structure

This paper contains 9 sections, 2 theorems, 4 figures, 2 tables.

Key Result

theorem thmcountertheorem

In a biedged graph $G=(V,E)$, there are at most a $|E|$ flubbles.

Figures (4)

  • Figure 1: An example of hairpin in a biedged graph with boundaries $x$ and $y$.
  • Figure 2: A biedged graph (a), its augmented spanning tree (b), and its flubble tree (c). An hairpin with boundaries $<14,15>$ would also be reported.
  • Figure 3: Workflow of povu Deconstruct.
  • Figure 4: A bandagebandage visualization of a hairpin $<462565,461860>$ detected by povu in S. cerevisiae where the two boundaries are highlighted (zoomed in).

Theorems & Definitions (4)

  • definition thmcounterdefinition: Edge Cycle Equivalence and Node Cycle Equivalence
  • definition thmcounterdefinition: k-edge-disconnectable and k-vertex-disconnectable
  • theorem thmcountertheorem
  • corollary thmcountercorollary