Designing small universal k-mer hitting sets for improved analysis of high-throughput sequencing
- 0 Collaborators
A basic task in bioinformatics is the analysis of many DNA Sequences. Most analyses are based on indexing the sequences based on k-long sub-sequences (K-mers). Most of the algorithms for DNA sequence analysis suffer from excessive memory usage and runtime. Today, with the technology advances of read ...learn more
Project status: Under Development
Overview / Usage
Improving k-mer counting through universal hitting sets.
A basic task in bioinformatics is the analyses of many DNA Sequences, most analysis are based on indexing the sequences based on k-long sub-sequences (K-mer) such as genome and transcriptome assembly, error correction, multiple sequence alignment, and repeat detection.
We propose to use universal hitting sets (UHS) that constitutes an alternative to minimizers the most common method for sequence indexing by k-mers . The main goal is to reduce run time and memory usage by using UHSs instead of minimizers.
Methodology / Approach
In our work, we will use an extant k-mer counting algorithm as a pattern detector that used in bioinformatics studies. We will create a new algorithm that based on our faculty advisor.
The input will be a DNA strings and each k-mer x ∈{A,C,G,T}^k
Our workflow will take place in the following steps:
- Finding the best algorithm for the project: first, we must find a k-mer counting algorithm that will match our goal, and stand in a two critical assumptions: it must use minimizer and the algorithm must be directed to k < 13.
- Understanding how to make the swap (new integration) between minimizer and UHSs (maybe need to study a new programming language) without changing any other part.
- Compare the two algorithms, the one with the Docks and the one with the minimizers and proving the efficiency of using UHSs.
Minimizers and UHS are a part of a bigger algorithm. Our adviser already proved the advantages of UHS compared to minimizer but using UHS in all the algorithm’s procedure was never done before. We expect that our outcome will strengthen the advisor’s result.