Intro to Big Data Bioinformatics

On this page, you will find information on the program training topics, syllabus and ways to register. Introduction to Bioinformatics - this training program is designed for everyone, including students who don't have a background in bioinformatics, as well as life science researchers. The objective is to introduce topics and examples to help participants understand Omics data, and the use of bioinformatics in life science research. As a result of this training, you will learn about Next Generation Sequencing (NGS) data analysis. This includes processing and preparing data for analysis in application to Genomics, Transcriptomics, and Metagenomics. You will also get an overview of downstream analysis and interpretation of various types of -omics data using bioinformatics, including commonly used annotation databases, statistical analysis and machine learning techniques.

Why Next Generation Sequencing? With the decreasing cost of Next Generation Sequencing (NGS) and the increasingly broad range of applications, this technology has transformed biomedical research, the biotechnology industry, and now is becoming increasingly becoming popular in clinical use. Analysis of NGS data can help identify pathogenic, germline, and somatic DNA variants; measure gene expression; detect methylation patterns, and even study microbial communities on human skin, in the gut, lungs, and other organs. That is why this program can help anyone who is getting started with life science research and bioinformatics to understand these techniques, their applications and a broad overview of various methods to know getting started with bioinformatics.

To learn more, we welcome you to view the video below and register for a free orientation session:

Key Topics Covered:

Big Data in Biology
 
Introduction to Omics Data in Life Science Research, including various types: Genomics, Transcriptomics and Metagenomics. Learn how Next Generation Sequencing can be used to study biological variation and understand genes, mutations and microorganisms responsible for experimental conditions or clinical factors in disease.
Statistical Analysis of Omics Data

Learn about statistical analysis for Big Data, including how to use appropriate analysis techniques to measure differences between groups of samples. See examples of advanced data analysis methods and ways to perform visualization, annotation and interpretation of analysis results.

Machine Learning for Biomedical Data Science

Understanding of analytical methods for processing, visualization and analysis of complex biomedical data, Learning terminology for machine learning and artificial intelligence in biomedical discovery

Bioinformatics Project Examples: Machine Learning for Biomedical Data

Case Study to learn about: Modelling Cancer Precision Medicine: Learn to analyze various omics data types, integrate them and associate them with a phenotype (response to treatment) using sophisticated machine learning algorithms.

Program Syllabus: Introduction to Big Data Bioinformatics

SESSION TITLE

DESCRIPTION

Associated online resources:

Introduction & Orientation 

  • Why is Bioinformatics needed?
  • Omics Next Generation Sequencing
  • Publicly Available Data Repositories
  • Introduction and commencement of the program
  • Introduction to the Faculty/mentors & Trainers
  • Demonstration of the slack channel and program page
  • Omics Logic Learn courses, projects, and profile demonstration
  • T-BioInfo Analytics Platform - Email and password, show the access to the demo pipeline.
  • Introduction of the participants & Q&A discussion 
  • Expectations and schedule review for the training program, description of the structure of the course, iand mportant deadlines. 

Slide6 (3)

 

Introduction to Genomics: DNA Variants & Mutations 

  • All about DNA, Chromosomes, Base Pairs & Nucleotides, Codons, and Amino Acids
  • Genome variations: A detailed understanding
  • Targeted Sequencing, Whole Exome Sequencing, and Whole Genome Sequencing
  • Data Formats for Sequencing Data
  • Analysis of Genomic Data using T-Bioinfo Server
  • Logical steps for Genomic Data Analysis and associated Algorithms
  • Analysis with Integrative Genomics Viewer

 

Slide7 (3)

Associated online course/resource:

Introduction to Metagenomics: 16s metagenomics sequencing


Session Topics: 

  • Introduction to NGS for Metagenomic Sequencing, 
  • Whole Genome, 16s rRNA, and amplicon data types  
  • Overview of the entire course, 
  • Overview of the tools and resources for this program (edu.t-bio.info courses, projects and datasets, and the T-BioInfo Analytics Platform), 
  • Expectations and schedule review for the training program, important deadlines 

Slide8 (3)

Associated online course:


 

Microbiome - functional bacterial communit

Session Topics: 

  • Metagenomic Sequencing & Applications of Metagenomics
  • Bacterial Function: from the lab to computational analysis, Expanding Directions and Research Priorities
  • Whole Metagenome sequencing
  • Functional Metagenomics & Phylogenetic Survey
  • Similarity & Homology, Gene Sequence, Phylogenetic Tree & Ribosomal RNA (rRNA)
  • Practical: Tree of Life
  • Use of small subunit (16s) rRNA for taxonomic classification
  • Data formats for sequencing data
  • Mapping on databases (Silva, GreenGenes, NCBI),
  • Hypervariable regions of ribosomal RNA
  • Flow of Metagenomics Pipeline
  • Practical Hands on: Mutation Variant Analysis on T-Bioinfo Server
  • Data Visualization on JBrowse
  • The American gut Project Data Example

Slide9 (3)

Associated online resources:

Introduction to Transcriptomics: Gene Expression Data Analysis 

  • Review of Next Generation Sequencing 
  • Visualization & Somatic Variation
  • RNA-Seq: Gene Expression Data Analysis
  • Overview of Transcriptomics
  • RNA Transcription
  • Mapping reads: Genes & Isoforms
  • RNA-Seq: Sequencing Data
  • Analysis logic: From raw reads to a table of expression (RNA-seq example)
  • Common sources of unwanted technical variation 
  • Pre-processing steps, filtering and cleaning the table of expression
  • Loading processed data for analysis
  • Gene Expression Examples
Slide10 (3)

 

Associated online resources:

 

Transcriptomics in Research: Differential Gene Expression & Pathway annotation  

  • Measuring mRNA abundance 
  • Data formats for sequence data
  • Processing short reads
  • Introduction to Differential gene expression
  • MS Excel: Loading Data , setting up a data frame, using numerical matrices & plotting data
  • Log-Scale & Log-normal  transformation
  • Quantile normalization
  • RNA-Seq Data after Quantile Normalization
  • Replicates & Signal - FPKM or Transformed
  • R: Loading data, setting up a dataframe, using numerical matrices & plotting data
  • Python: Loading data, setting up a dataframe, using numerical matrices & plotting data

Introduction to Single Cell Transcriptomics

  • Single cell genomics and clinical paradigms: An overview of the current scRNAseq technologies
  • Basic overview of pipelines for processing raw reads into expression values
  • Quality control of scRNAseq data
  • Dimensionality reduction and clustering techniques

 

Associated online resources:


Biomedical Data Science: Introduction to Machine Learning for NGS Data

  • Review of Transcriptomics in Research: Differential Gene Expression & Pathway annotation 
  • DeSeq 2 Outputs
  • Introduction to Machine Learning
  • Types of Machine Learning methods 
  • Overview of unsupervised machine learning methods
  • Dimensionality Reduction 
  • PCA Visualization: Scatter Plot
  • Hands-on analysis on T-Bioinfo Server
  • Finding patterns and similarities in data
  • Principal Component Analysis (PCA) Hierarchical and K-means clustering
  • Biological Interpretation

Project Examples:

Overview of Machine Learning Projects

  • Overview of Supervised machine learning methods
  • Classification: Decision Trees, Random Forest (RF), Support Vector Machine (SVM)
  • Preparing Training and test datasets
  • Random Forest: Many Decision Trees
  • Bagging & Bootstrapping
  • Supervised Machine Learning: Performance Measures
  • Feature Selection
  • Feature Reduction: Feature Extraction Vs Feature Selection 
  • Demonstration on Demo Dataset: Breast Cancer Cell line Data
  • Bioinformatics Project examples & Omicslogic Research Fellowship

Associated online resources:

Case Studies: Project examples and publicly available RNA-Seq datasets on NCBI, GEO, SRA, examples in oncology, infectious diseases, and agriculture

  • Planning your project, 
  • Case Studies & Publications, Datasets
  • Examples from oncology, neuroscience science and infectious diseases
  • Components of bioinformatics analysis:
  • Literature review, compiling a primary dataset, developing an analysis plan, performing exploratory analysis, data processing, and preparation, statistical analysis, biological interpretation, and validation.

Register for the Upcoming Webinar Session:

Lane
"This lesson gave a good explanation and example for how normal citizens can participate in complex biomedical work without the extensive background many scientists have".
- Lane Yutzy, PhD Fellow
Jeevanjot
"It literally helped me to clear out my basic concepts. Also the study material provided in reference was extremely helpful. It simply explained the terms involved".
- Jeevanjot Kaur, Graduate Student
Kakshil-1
"It helped me understand the overview of the research fellowship and how it guides newcomers to become experts in the field of bioinformatics, through the use of courses, literature, examinations, applications etc".
- Kakshil Patil, Graduate Student