New approaches for identification housekeeping genes through clustering and machine learning algorithms
Housekeeping genes or reference genes are required for the maintenance of basal cell functions, which are essential for maintaining a cell. Thus, they are expected to be expressed in all cells of an organism, regardless of the type of tissue, status or condition to which the cell is exposed. For the study of this type of genes are used diverse approaches, one of the most used in Next Generation Sequence is the RNA Sequence, a high-throughput technique, which allows to measure the profile of genetic expression of a target tissue or cell Isolated. The analyses are performed by sequencing the complementary DNA to find out the transcription mechanisms that are present in the target tissue or cell. Machine learning methods are applied in different areas of genetics and genomics, allowing the interpretation of large datasets, such as those related to gene expression. One of the most used techniques is the clustering algorithms, a technique that allows defining groups of genes with similar expression profiles, which allows the study of the function and interaction of genes. For the identification of housekeeping gene candidates with ML technique, corynebacterium pseudotuberculosis, an intracellular pathogen, was used as a model organism. This organism mainly infects sheep, goats, horses, among others causing the Caseous lymphadenitis disease, For the study, the datasets of RNA-seq expression of strains 258 and 1002 of this bacterium were used.