Lucy Williams' Comprehensive Exam: Flow Decomposition Algorithms for Multiassembly Problems
- Wednesday, October 6, 2021 from 10:00am to 11:00am
Current genetic sequencing technologies allow for fast and cheap measurement of short substrings of genetic sequence called reads, which must be assembled to recover the full unknown sequence. In some cases, such as when assembling RNA transcripts or the genomes of a mixture of species taken in a single sample, the reads come from multiple sequences. In this case, we would like to recover all of the distinct unknown sequences and their relative abundances, and we call this multiassembly. A common model underlying many multiassembly approaches is flow decomposition, which decomposes a flow network into a set of paths and weights that parsimoniously explains the flow. In previous work, we formalized two new variations on flow decomposition to better model the information available in multiassembly tasks. The first, inexact flow decomposition, allows for some uncertainty in the flow measurements. The second, flow decomposition with subpath constraints, incorporates additional information that may be provided by longer reads. We give heuristic and FPT algorithms to solve these problems, and demonstrate their usefulness for RNA assembly on a simulated RNA-Seq dataset.
We propose to extend our existing work in two ways. First, we will apply the flow decomposition with subpath constraints model to real data from RNA transcript assembly and mixed-genome assembly experiments, which will require us to compute flows and subpath constraints from real data and allow us to validate the model on more realistic inputs. This process may also yield new, biologically-inspired variants on the problem. Second, we will use a currently-being-developed integer linear program (ILP) for flow decomposition to further validate the method on larger instances. The ILP appears to be much faster than our existing exact solvers. We will also give extended formulations of the ILP that solve additional problem variants.