Coronavirus Sequence Prediction with Transformer Models

Tri Songz

Tri Songz

Unknown

1 1
  • 0 Collaborators

Leveraging Natural Language Transformer Models (BERT, XLNet, etc) to develop methods for identifying, classifying, and predicting DNA protein sequences using publicly available NCBI Virus database. ...learn more

Project status: Concept

Artificial Intelligence

Intel Technologies
DevCloud

Overview / Usage

Developing a Natural Language model to identify and predict protein DNA sequences of the coronavirus, since DNA sequences are just pairs of ATGC character pairs, which can be tokenized and trained through Transformer based Language Models for task-specific functions and predictions, such as classification, recognition, etc.

If valid - the method would work towards predicting key parts of the sequence that can be used to further research into developing vaccines/cures.

Current Progress: Analyzing Dataset compiled from NCBI to identify commonalities to perform tokenization.

Technologies Used

Comments (1)