implementation of oneAPI analytics toolkit in Medical Science

Abhishek Nandy

Abhishek Nandy

Kolkata, WB

0 0
  • 0 Collaborators

We will be exploring single cell data (eg:- scRNA sequence). We will be porting Clustergrammer2 to AI analytics toolkit. ...learn more

Project status: Published/In Market

oneAPI, HPC

Intel Technologies
oneAPI, Intel Python, DevCloud

Overview / Usage

.We will be exploring single cell data (eg:- scRNA sequence).

We will be porting Clustergrammer2 to AI analytics toolkit.

Clustergrammer2 produces highly interactive visualizations that enable intuitive exploration of high-dimensional data and has several optional biology-specific features (e.g. enrichment analysis; see Biology-Specific Features) to facilitate the exploration of gene-level biological data.

It is a web base tool for visualizing and analysing high dimensional data (eg single cell RNA sequence) as interactive and shareable heatmaps.

Methodology / Approach

Intel DevCloud used for the project

We will be exploring gene expression data that has got very good implementation I terms of studying diseases such as cancer.

As we explore heatmaps the information we get is very useful for studying where gene mutation has occurred.

Porting Clustergrammer 2 to AI Analytics toolkit gives us an edge of exploring data interactively of 2700 PBMC’s(Peripheral blood mono nuclear cell)obtained from 10X GENOMICS(dataset).

We will be using Intel Optimized Python from AI analytics toolkit and run the programs in Intel DevCloud

We will also use an external dataset for exploration known as CIBERSORT(This dataset provides an es timation of abundances of number of cell types in a mixed population using gene expression data.

We will be loading the data as a Sparse matrix format.

The dataset consists of 32 thousand genes and 2700 single cells.

Using Intel Optimized python we will normalize the dataset(i.e gene expression data GEX data) and find top expressing genes.

Then we will implement ArcSinh transform and Z-Score.

After that we load the data into CLusterGrammer2 that we ported for AI Analytics toolkit. We observe interactive heatmaps.

Here are the features of ClusterGrammer2

-Zooming and Panning

Allows users to zoom into and pan across their heatmap by scrolling and dragging

-MouseOver Interations

Mousing over elements in the heatmap brings up additional information using tooltips.

-Row and column reordering

Interactive Dimensionality reduction

Dimensional reduction is useful data analysis technique that is often used to reduce the dimensionality of high dimensional datasets down to number that can be visualized.

-Interactive Dendogram

Clustergrams typically have dendrogram trees (for both rows and columns) to depict the hierarchy of row and column clusters produced by hierarchical clustering. The height of the branches in the dendrogram depict the distance between clusters. Clustergrammer depicts this hierarchical tree one slice at a time using trapezoids.

Sample Code on Intel Dev Cloud

import numpy as np

import pandas as pd

from clustergrammer2 import net, Network, CGM2

import warnings

warnings.filterwarnings('ignore')

#Load Data

df = {}

df['clean'] = pd.read_csv('../data/rc_two_cat_clean.csv', index_col=0)

df['meta_col'] = pd.read_csv('../data/meta_col.csv', index_col=0)

df['meta_cat'] = pd.read_csv('../data/meta_cat_col.csv', index_col=0)

#Widget Viewer

net.load_df(df['clean'], meta_col=df['meta_col'])

net.set_manual_category(col='Category', preferred_cats=df['meta_cat'])

net.widget()

Technologies Used

Intel oneAPI

Intel oneAPI AI Analytics toolkit

Intel Optimized Python

Intel optimized Scikit Learn

Comments (0)