Network Flow Decomposition for RNA Sequencing
- Monday, April 1, 2019 from 3:00pm to 4:00pm
- Barnard Hall, 347 - view map
Next-generation sequencing technology has enabled high-throughput, low-cost measurement of biological sequence data such as DNA and RNA. A critical step in the process is the assembly of overlapping short sequence reads, only 75-500 characters long, into full genes, genomes, or RNA transcripts. Because the accuracy of the assembly directly determines the accuracy of the reported sequence, this difficult algorithmic problem is an active area of bioinformatic research. In 2013, researchers proposed to model the unassembled reads using a flow network, a fundamental and well-studied concept in computer science. Then, the true sampled genes or RNA transcripts are represented as paths through the network. According to the principle of parsimony, we assume that the smallest number of paths that can explain the flow on the network is the best solution. Thus, to solve the assembly problem, we must solve the minimal path decomposition problem.
In this survey, I give an overview of the research on the minimal path decomposition problem, where we decompose a flow on a network into a minimum set of weighted paths which explains the flow on every edge. This problem was first studied in relation to networking, where minimizing the paths needed to describe a flow kept router table sizes small. It is known to be NP-hard. However, researchers have developed heuristic, approximation, and fixed-parameter tractable algorithms for this problem. Additionally, I describe our current work here at MSU to extend the flow network model and give heuristics to accommodate errors in short read measurement. We argue that this change better describes the true experimental condition and may lead to more accurate assemblies.
- Gianforte School of Computing