COVID19, Ebola, Malaria. Can bioinformatics help us fight them?

Microorganisms or microbes are tiny (measuring microns) organisms, existing as single cells or a colony of cells. Microbes can be bacteria, archaea, protists, or fungi. Although debated whether they are living or non-living, viruses and prions are still considered microbes. They help in enzyme production, foods like alcohol and cheese production, and even help in proper digestion in our human bodies. 

Nevertheless, some microbes can cause diseases and are called pathogens. Microbes infect other living organisms (host) and use the replication mechanism of the host. Microbes often surpass the host’s immune response and cause severe diseases. The pathogens cause infections in humans and other living organisms by spreading directly or indirectly from one organism to another and are termed infectious diseases. Infectious diseases can cause several deaths and disorders. Pandemics have caused devastating effects in human history like the bubonic plague, HIV-AIDS global pandemic, and the current COVID-19 pandemic. 

Why is Bioinformatics a good tool for studying diseases?

Studying the microbes and the host’s immune response can help in fighting these diseases. Microbes require ideal conditions for growth, and only a few of them are culturable. The introduction of bioinformatics and NGS sequencing helped in the identification of the sequence of many micro-organisms. The sequences of all the known microbes are available in public databases. This enables faster analysis of the data and helps in addressing various questions regarding infectious diseases. Supercomputing has enabled massive amounts of data to be analyzed in a few minutes. Various algorithms and statistical analyses developed through the years make these analyses accurate. Hence, Bioinformatics analysis can bridge many gaps in our current understanding of diseases. Some of the major topics that can be addressed using bioinformatics are:

  • Understanding the origin and identity of any outbreak
  • Potential mutations and Pathogenesis 
  • Drug and Vaccine development for a pathogen
  • Disease Surveillance
  • Relationship with other pathogens 
  • Biomarkers for detection, prognosis, or treatment of the diseases
  • Drug Resistance 

Introduction to Bioinformatics Program - Fall 2020.002Figure - Bioinformatics is a tool that can help biologists answer numerous questions.

Identity of a disease:

The publicly available databases of the microbial genome help in getting the sequence and various tools help in multiple sequence alignment to get the identity of the unidentified microbe. Using phylogeny, they can be identified if they are a new species or a different strain. 

In late December in the Wuhan province of China, several patients were diagnosed with acute respiratory distress syndrome (ARDS) due to an unidentified microbial infection. The microbe was sequenced to identify the pathogen. The pathogen was found to be a strain of Coronavirus using multiple sequence alignment of the previously known SARS and MERS viruses. The new strain called SARS-CoV2 had 88% identity to the sequence of two bat-derived SARS coronaviruses and about 50% identity to the sequence of MERS-CoV. Phylogeny analysis of the viruses shows how much the viruses are genetically different.

Bioinformatics for Infectious Diseases - Promo Video.012

Figure - Using phylogenetics, scientists are able to decipher the origin of SARS-COV2

Origin of the pathogen:

In disease outbreaks, endemic, and pandemics, the pathogen often changes its genomic sequence to adapt, evade and survive the host’s response better. These mutations can be analyzed and this genomic variation can be used to identify the origin of the particular disease.

The current pandemic’s origin, though is still unknown the phylogeny map shows SARS-CoV2 to be 96% similar to bat SARS-CoV-like viruses. The mutations between the virus from pangolin show the SARS-CoV2 could have mutated through recombination. Either way, we can conclude that the SARS-CoV2 was originated from a zoonotic strain 


Identifying the sequence, structure, and functional aspect of the genome can help in understanding the mechanism of how the pathogen evades the host and transmits a disease. In SARS-CoV2, the Open Reading Frame (OFR) of the virus shows the number of the spike, envelope, and other proteins made. Comparing the genome of the other coronaviruses, it was concluded that the spike protein binds to the ACE2 receptor of humans just like in SARS-CoV(1).

Once infected, the human immune response can be found out by studying the gene expression of humans. The genes are upregulated or show more function or overexpressed in the infected state than in the non-infected state. A differential gene expression analysis can reveal the genes that are upregulated. In SARS-CoV2, the cytokines are expressed more as a reaction to the virus entry. This cytokine storm affects the human body resulting in further complications. 

To learn more and get a hands-on experience, check out the COVID-19 project on the OmicsLogic website.

Drug and Vaccine Development:

Once, the pathogenesis is known, drugs and vaccines can be developed against the pathogen to prevent and treat infectious diseases. The drug structure and the protein structure databases can be used to identify potential drugs that can treat the disease. Once, the target protein is known, it can be paired with multiple drugs and the efficacy of these drugs can be seen. The drugs can also be slightly repurposed to check efficiency. This way, many older drugs can be repurposed, and also the trial and error time process of drug development reduces drastically.

Vaccines can be made using the RNA or DNA structure and not use whole inactivated pathogens. This can be useful as the pathogens keep evolving and vaccine development over the years can be easy by changing the mutation.

Biomarkers and other gene discovery:

Important genes and biomarkers can be identified using bioinformatics. Malaria is one of the parasitic diseases endemic in multiple regions. By sequencing and analyzing multiple human samples using GWAS, the people with high malarial infection had the mutation of gene ATP2B4 in chromosome 6 (2). The mutation can serve as a biomarker for people with high-risk infections and can help in disease prevention.

FREE Webinar-Genomics-Program EB.007

Figure - Traits such as drug resistance can be attributed to mutations, which can be identified and studied using bioinformatics.

Drug Resistance:

Though several drugs have been discovered and treatments are done, the pathogens develop resistance to these drugs. The increasing drug resistance leads to treatment failure and sometimes unlikely deaths. The drug-resistant pathogens or superbugs can be sequenced and studied to identify how they developed drug resistance and can be used for treatment regimens and further studies.

Drug-resistance tuberculosis is one of the common superbug infections that leads to lots of death. The whole-genome sequencing and analysis of the Mycobacterium tuberculosis (Mtb) helped in finding the genomic sites that cause drug resistance and have made databases like PATRIC and ReSeqT (3). Now, using this genomic information, rapid tests are done to see if the isolate in a patient is resistant to the many drugs used in the treatment of tuberculosis. This helps in better treatment of the patient.


Bioinformatics has helped a lot in infectious disease studies yet there are still a few challenges that remain. The sequencing errors, noise, and bias do affect the analysis. Better techniques and algorithms to combat these issues are required. More data needs to be available for all the diseases and they should be organized and researched to understand more. Data-driven studies need to be increased. These challenges can be overcome when all of our work together. You can now easily join us through the spring program on “Bioinformatics for Infectious Diseases”.It is a 1-month mentor-guided program with hands-on training commencing from February 14, 2022, and will be held till March 21, 2022.

Other Useful links:


Join our student club for more exciting events:

  • Join our social groups:


  1. Xiaowei Li, Manman Geng, Yizhao Peng, Liesu Meng, Shemin Lu, Molecular immune pathogenesis and diagnosis of COVID-19, Journal of Pharmaceutical Analysis, Volume 10, Issue 2, 2020, Pages 102-108, ISSN 2095-1779, (
  2. Malaria Genomic Epidemiology Network. Insights into malaria susceptibility using genome-wide data on 17,000 individuals from Africa, Asia, and Oceania. Nat Commun 10, 5732 (2019).
  3. Ngo, TM., Teo, YY. Genomic prediction of tuberculosis drug-resistance: benchmarking existing databases and prediction algorithms, BMC Bioinformatics 20, 68 (2019).