Table of Contents
Fetching ...

On Counting Subsequences and Higher-Order Fibonacci Numbers

Hsin-Po Wang, Chi-Wei Chin

TL;DR

The number of unordered sets of n strands of DNA that have a common supersequence whose length is at most $t$ is studied to maximize the amount of information that can be synthesized into DNA within a finite amount of time.

Abstract

In array-based DNA synthesis, multiple strands of DNA are synthesized in parallel to reduce the time cost from the sum of their lengths to the length their shortest common supersequences. To maximize the amount of information that can be synthesized into DNA within a finite amount of time, we study the number of unordered sets of $n$ strands of DNA that have a common supersequence whose length is at most $t$. Our analysis stems from the following connection: The number of subsequences of A C G T A C G T A C G T ... is the partial sum (prefix sum) of the fourth-order Fibonacci numbers.

On Counting Subsequences and Higher-Order Fibonacci Numbers

TL;DR

The number of unordered sets of n strands of DNA that have a common supersequence whose length is at most is studied to maximize the amount of information that can be synthesized into DNA within a finite amount of time.

Abstract

In array-based DNA synthesis, multiple strands of DNA are synthesized in parallel to reduce the time cost from the sum of their lengths to the length their shortest common supersequences. To maximize the amount of information that can be synthesized into DNA within a finite amount of time, we study the number of unordered sets of strands of DNA that have a common supersequence whose length is at most . Our analysis stems from the following connection: The number of subsequences of A C G T A C G T A C G T ... is the partial sum (prefix sum) of the fourth-order Fibonacci numbers.
Paper Structure (18 sections, 13 theorems, 19 equations, 3 figures)

This paper contains 18 sections, 13 theorems, 19 equations, 3 figures.

Key Result

Proposition 8

The number of the subsequences of $\overline{a_1 \dotsm a_q}^t$ that are not subsequences of $\overline{a_1 \dotsm a_q}^{t-1}$ is $F_q(t)$. The number of the subsequences of $\overline{a_1 \dotsm a_q}^t$ is the partial sum $F_q(0) + \dotsb + F_q(t)$.

Figures (3)

  • Figure 1: Array-based DNA synthesis: At time $s = 1$, we ask $x^1, x^2, x^3, x^4$ whether they want $M_1 = \mathsf A$; only $x^2$ says yes. At time $s = 2$, we ask whether they want $M_2 = \mathsf C$, and $x^3, x^4$ say yes. The same process repeats until we ask whether they want $M_{12} = \mathsf T$, and $x^1$ says yes.
  • Figure 2: Details of array-based DNA synthesis. Each (a)--(d) cycle consumes one letter from the master lineup. Cf. HVS23.
  • Figure 3: Left: CaH69 counts the subsequences of $\overline{\mathsf A\mathsf C}^t$ by length. Right: we count by $\tau$ (i.e., synthesis time).

Theorems & Definitions (14)

  • Definition 7: Higher-order Fibonacci numbers
  • Proposition 8
  • Proposition 9
  • Lemma 10
  • Lemma 11
  • Proposition 12
  • Proposition 13
  • Proposition 14
  • Proposition 15
  • Proposition 16
  • ...and 4 more