Table of Contents
Fetching ...

qPMS Sigma -- An Efficient and Exact Parallel Algorithm for the Planted $(l, d)$ Motif Search Problem

Saurav Dhar, Amlan Saha, Dhiman Goswami, Md. Abul Kashem Mia

TL;DR

The algorithm performs a little better than the existing exact algorithms to solve the qPMS problem in DNA sequence and is mainly based on the algorithms qPMSPrune, qPMS7, TraverStringRef and PMS8.

Abstract

Motif finding is an important step for the detection of rare events occurring in a set of DNA or protein sequences. Extraction of information about these rare events can lead to new biological discoveries. Motifs are some important patterns that have numerous applications including the identification of transcription factors and their binding sites, composite regulatory patterns, similarity between families of proteins, etc. Although several flavors of motif searching algorithms have been studied in the literature, we study the version known as $ (l, d) $-motif search or Planted Motif Search (PMS). In PMS, given two integers $ l $, $ d $ and $ n $ input sequences we try to find all the patterns of length $ l $ that appear in each of the $ n $ input sequences with at most $ d $ mismatches. We also discuss the quorum version of PMS in our work that finds motifs that are not planted in all the input sequences but at least in $ q $ of the sequences. Our algorithm is mainly based on the algorithms qPMSPrune, qPMS7, TraverStringRef and PMS8. We introduce some techniques to compress the input strings and make faster comparison between strings with bitwise operations. Our algorithm performs a little better than the existing exact algorithms to solve the qPMS problem in DNA sequence. We have also proposed an idea for parallel implementation of our algorithm.

qPMS Sigma -- An Efficient and Exact Parallel Algorithm for the Planted $(l, d)$ Motif Search Problem

TL;DR

The algorithm performs a little better than the existing exact algorithms to solve the qPMS problem in DNA sequence and is mainly based on the algorithms qPMSPrune, qPMS7, TraverStringRef and PMS8.

Abstract

Motif finding is an important step for the detection of rare events occurring in a set of DNA or protein sequences. Extraction of information about these rare events can lead to new biological discoveries. Motifs are some important patterns that have numerous applications including the identification of transcription factors and their binding sites, composite regulatory patterns, similarity between families of proteins, etc. Although several flavors of motif searching algorithms have been studied in the literature, we study the version known as -motif search or Planted Motif Search (PMS). In PMS, given two integers , and input sequences we try to find all the patterns of length that appear in each of the input sequences with at most mismatches. We also discuss the quorum version of PMS in our work that finds motifs that are not planted in all the input sequences but at least in of the sequences. Our algorithm is mainly based on the algorithms qPMSPrune, qPMS7, TraverStringRef and PMS8. We introduce some techniques to compress the input strings and make faster comparison between strings with bitwise operations. Our algorithm performs a little better than the existing exact algorithms to solve the qPMS problem in DNA sequence. We have also proposed an idea for parallel implementation of our algorithm.
Paper Structure (78 sections, 23 figures, 4 tables, 9 algorithms)

This paper contains 78 sections, 23 figures, 4 tables, 9 algorithms.

Figures (23)

  • Figure 1: Starting index of motifs in a DNA sequence .
  • Figure 2: Life cycle of a cell.
  • Figure 3: Prokaryotic and Eukaryotic cells.
  • Figure 4: DNA double helix formed by base pairs attached to a sugar-phosphate backbone.
  • Figure 5: Hydrogen bonds between A-T and C-G.
  • ...and 18 more figures