kmer counter

0 0
  • 0 Collaborators

Designing small universal k-mer hitting sets for improved analysis of high-throughput sequencing ...learn more

Project status: Under Development

PC Skills

Intel Technologies
Other

Docs/PDFs [1]

Overview / Usage

A basic task in bioinformatics is the analysis of many DNA

Sequences. Most analyses are based on indexing the sequences based on k-long sub-sequences

(K-mers). Most of the algorithms for DNA sequence analysis suffer from excessive memory usage and runtime. Today, with the technology advances of reading DNA sequences, efficiency of these kind this kind of algorithms is a very important mission

Methodology / Approach

we will use an extant k-mer counting algorithm as a pattern detector that used in bioinformatics studies. We will create a new algorithm that based on our faculty advisor.

The input will be a DNA strings and each k-mer x ∈{A,C,G,T}^k

Our workflow will take place in the following steps:

  1. Finding the best algorithm for the project: first, we must find a k-mer counting algorithm that will match our goal, and stand in a two critical assumptions: it must use minimizer and the algorithm must be directed to k < 13.
  2. Understanding how to make the swap (new integration) between minimizer and UHSs (maybe need to study a new programming language) without changing any other part.
  3. Compare the two algorithms, the one with the Docks and the one with the minimizers and proving the efficiency of using UHSs.

Technologies Used

java, linux, python

Documents and Presentations

Comments (0)