

New sequencing technologies, such as pyrosequencing, avoid the problem by working from single DNA molecules, but these emerging methods still have limited application. Mixed traces are often discarded as uninterpretable. In this case, the mixed trace downstream of the indel is formed by two allelic traces superimposed onto each other with a phase shift – ( Figure 1). More often mixed traces occur as a result of direct sequencing of diploid alleles containing heterozygous insertions/deletions. Simultaneous sequencing of completely unrelated templates occurs during sequencing of RT-PCR products containing alternative splicing sites and during screening of random insertional mutagenesis libraries. Therefore, it can facilitate sequencing of indel-rich regions of genomes and speed up discovery and characterization of indel mutations, including those causing diseases in humans.ĭirect fluorescent sequencing of two dissimilar templates produces a mixed trace, which appears as if the traces obtained for each template separately were superimposed onto each other. Unlike most existing computational approaches to the problem, our method does not require knowledge of one of the involved sequences to use as a reference, nor any other additional information. Here we describe an algorithmic method which accurately reconstructs the pair of allelic sequences from the observed complex pattern of calls.

While signaling the presence of a potentially important mutation, such output cannot be read directly and often gets discarded.

If, due to insertion or deletion (indel) mutations, one allele contains extra nucleotides, most sites in the sequencing output beyond the mutation site will contain pairs of nucleotide calls. Yet, samples from organisms with two sets of chromosomes generally contain two types of DNA molecules (alleles), each derived from one parent. When these are identical, each site in the output contains a single nucleotide call. The most common technique for determining such sequences, the Sanger method, outputs a single consensus for a pool of DNA molecules in the analyzed sample. In DNA, information is encoded as a sequence of four types of building blocks–nucleotides.
#CODES SEQUENCHER LETTERS FREE#
It is available as a free Web application Indelligent at. Because these conditions occur in most encountered DNA sequences, the method is widely applicable. Simulations with artificial sequences have demonstrated that the method yields accurate reconstructions when (1) the allelic sequences forming the mixed trace are sufficiently similar, (2) the analyzed fragment is significantly longer than the indel, and (3) multiple indels, if present, are well-spaced. We used the method to decode 104 human traces (mean length 294 bp) containing heterozygous indels 5 to 30 bp with a mean of 99.1% bases per allelic sequence reconstructed correctly and unambiguously. We describe a simple yet accurate method, which uses dynamic programming optimization to predict superimposed allelic sequences solely from a string of letters representing peaks within an individual mixed trace. Existing computational methods for deconvolution of such traces require knowledge of a reference sequence or the availability of both direct and reverse mixed sequences of the same template. Direct Sanger sequencing of a diploid template containing a heterozygous insertion or deletion results in a difficult-to-interpret mixed trace formed by two allelic traces superimposed onto each other.
