gene sequence de-redundancy

zhen ju

zhen ju

Unknown

4 0
  • 0 Collaborators

A novel Greedy Incremental Alignment-based algorithm called nGIA was proposed for sequence clustering with high efficiency and precision. The nGIA consists of a pre-filter, a modified short word filter, a new data packing strategy, a modified greedy incremental method, and is parallelized via GPU. ...learn more

Project status: Published/In Market

oneAPI, HPC

Intel Technologies
DevCloud, oneAPI, DPC++, Intel vTune

Docs/PDFs [1]Code Samples [1]

Overview / Usage

Non-redundant sequence datasets are of utmost importance in bioinformatics. Redundant sequences do not provide any information but will cost a lot when analyzing these sequences. Therefore, various de-redundancy tools have been developed, such as CD-HIT, Usearch, and Vsearch. But these tools are all based on CPU. To make the running time acceptable, approximate algorithms are used to speed up. As result, they can‘t get accurate results.

We implemented a new tool. Taking the advantage of GPU, our tool can get accurate results, and runs fastest on the hardware at the same price. Our tool supports CUDA and one API.

Methodology / Approach

The core algorithm of sequence de-redundancy can be simply summarized into two steps. First, use a low-time complexity algorithm to filter out obviously dissimilar sequences. Second, the dynamic programming algorithm is used to calculate the similarity of the two sequences.

We improved the filter algorithm based on the pigeon principle and improve dynamic programming algorithm performance by compressing data. All of the above algorithms achieve heterogeneous acceleration by CUDA and One API.

Our application was originally developed with CUDA and then migrated to oneAPI by the dpcp tool. There are some errors in the code after the automatic migration, and we have debugged manually.

Technologies Used

We have completed software development on the dev cloud platform. One API base toolkit and one API HPC toolkit are used, and the software runs on Xeon CPU and GPU. We used VTune to improve performance.

Documents and Presentations

Repository

https://gitee.com/ju_zhen/nGIA

Comments (0)