Data Science

The rapid growth of high-throughput data, including -omics technologies, gave rise to significant demand for data science skills and experience with bioinformatics methods of analysis. This online training program will cover practical and conceptual aspects of data science, including data wrangling, statistical analysis, and machine learning in application to high-throughput biomedical omics data using big data analysis tools on the T-BioInfo Platform, R and Python. Throughout the course, students will get an understanding of opportunities and limitations of machine learning in the context of basic pre-clinical and clinical research.

The training is designed as a combination of online resources, practical assignments and live workshops that will be conducted via ZOOM. Throughout the course, we will review several project examples that demonstrate successes and limitations of conventional machine learning (ML) methods and deep learning (DL) using data from public repositories. To learn more, we welcome you to review the information on this page and register for an upcoming webinar where the program topics will be introduced and questions about he training will be answered by our team.

Insights from Webinar Conducted 

Overview of Data Science for Biomedical Data Analysis Training Program 

Dr.Raghavendran sheds light on building machine learning model for biomedical data

Types of Data Analysis" integrated in the "Data Science for Biomedical Data Program

Pre - Register for the Program

Key Topics Covered:

  Big data, HPC and cloud computing

Many types of omics data require step-by-step preparation, exploration, annotation, and visualization to understand. The T-BioInfo platform was designed for big multi-omics data analysis hiding the complexities of data with a user-friendly and intuitive interface that eliminates the need for coding and advanced machine learning algorithms for data integration and mining.

 NGS: omics data types and use cases

A program that embeds data-driven concepts into biological projects, spanning the student learning journey from observer to participant in research. Project-based learning for big data bioinformatics is to go beyond the theory with real datasets, projects, and expert mentors. Work with curated datasets from publicly available repositories with easy-to-follow tutorials.

 Computational pipelines for data processing

Data processing for Next Generation Sequencing, Mass-Spectroscopy, Structural and phenotypic data. Build and adapt pipelines using similar approaches to data mapping, quantification, and annotation that are used to prepare data for downstream statistical analysis, train machine learning models and annotate features.

 Introduction to Programming: R and Python
Online bioinformatics coding exercises to learn and explore R and Python scripting and understand how to analyze and visualize -omics data to extract meaningful insights from large biological datasets. Learn, practice, and achieve bioinformatics greatness with concise exercises and interesting challenges right in the comfort of your browser!
A Complete Data Science Crash Course with Hands-on Training and curated Case Study Datasets
Principal Component Analysis - Exploratory Data Analysis


Data Wrangling and Exploratory Data Analysis

Machine Learning and Statistical Analysis


Statistical and Machine Learning Approaches

R studio - Coding Environment for Code Development


Coding Environments and Best Practices in R and Python

Research Projects and Case Studies with curated Data Sets


Introduction to Case Studies using Research Omics Data


Program Syllabus : Data Science for Biomedical Data

Data Science for Biomedical Data

Introduction to the program

  • Overview of key topics that will be covered, including:
  • Data loading and preparation
  • Data visualization in R and Python
  • Statistical Concepts and Tests
  • Supervised and Unsupervised Machine Learning
  • Network Analysis and Deep Learning
       Associated Online Resources: 
The need for Machine Learning and Statistics: Processing high throughput data

Processing high throughput (BIG) data

  • Data complexity and need for preparation
  • Availability and variability of data
  • Unprecedented detail and volume
  • Data heterogeneity, complexity, and noise
  • Need for interpretability and reproducibility
  • Limitations of statistical analysis
          Associated Online Resources:
Machine Learning Methods

Major Types of Machine Learning Methods

  • Unsupervised and supervised types of analysis
  • Dimensionality Reduction, clustering and classification
  • Hierarchical and K-means clustering
  • Big Data Clustering (CLARA, PAM, fuzzy)
  • Conventional Regression-based methods
  • Random Forest and SVM
  • Deep Learning
Associated Online Resources:
Data Visualization Machine Learning for Data visualization
  • Dimensionality Reduction Objectives
  • Continuous and Ordinal Data 
  • Multi-variate statistical analysis approaches
  • Variance and co-variance
  • Singular Value Decomposition (SVD)
  • Principal Component Analysis, Coordinate Analysis and NMDS tSNE and UMAP
Associated Online Resources: 


Unsupervised Learning Unsupervised Learning: Clustering
  • Patterns and Learning
  • Clustering for data mining
  • PCA and clustering
  • K-means and Hierarchical clustering
  • Big Data Clustering (CLARA, PAM, fuzzy)
  • Clustering objects and features
  • Feature Engineering
Associated Online Resources:
Supervised Learning Supervised Learning: Clustering
  • Overview of supervised learning
  • Preparing Training and test datasets
  • Binary Decision Trees
  • Random Forest (RF)
  • Support Vector Machine (SVM)
  • Discriminant Analysis: LDA and QDA
  • Model Accuracy and Specificity

Associated Online Resources

Feature Selection and Gene Signature Construction Feature selection and gene signature construction
  • Need for feature selection
  • Feature selection for tSNE
  • Feature selection for biomarker discovery
  • Methods for feature significance
  • Algorithms and approaches (greedy)
  • Technical accuracy (ROC curve and AUC)
  • Logical or biological relevance
  • Automating the process
Associated Online Resources: 
Regression, Generalized linear modls

Regression and generalized linear models (GLM)

    • Factor Regression Analysis and interpretation
    • Using regression for missing data
    • Data distributions (non-normal)
    • Generalized Linear Models (GLM)
    • Logistic Regression
    • Interpretation

          Associated Online Resources:

Network Analysis Network analysis
  • Objectives of Network Analysis
  • Signaling and Metabolic Pathways
  • Gene Expression Networks
  • Time-series data
  • Overview of Bayesian Network
  • Examples of Network analysis

Associated Online Resources:

Combining machine learning methods for real world applications Combining/selecting ML methods
  • Objectives of real-world ML projects:
  • Exploratory Analysis
  • Data Visualization
  • Statistical/Data Mining
  • Predictive Analysis
  • Feature Selection

Associated Online Resources

Deep Learning: Types and Application Deep Learning: Types & Application
  • Introduction to Deep Learning
  • Models and Approaches
  • Multi-Layer Perceptrons (MLP)
  • Convolutional Neural Networks (CNN)
  • Recurrent Neural Networks (RNN)
  • Dimensionality Reduction with Deep Learning
  • Model Optimization

Associated Online Resources:

Selecting a data science project for analysis Selecting a project for ML analysis
  • Designing a Bioinformatics Research Project
  • Gene Expression (Transcriptomics)
  • Variant Analysis (Genomics)
  • Microbiome Diversity (Metagenomics)
  • Basic Research, Clinical

Associated Online Resources:

Use cases in clinical applications Use-cases in clinical applications
  • Patient Derived Xenografts (PDX)
  • Modeling Precision Medicine for Breast Cancer
  • Biomarker Selection in Alzheimer's Disease
  • Relationship between Geography and Genomic Variation

Associated Online Resources: 

Data Science use cases in industry applications

 Use-cases in industry applications

  • TCGA Liver Cancer Project
  • Microbiome of Skin and gut
  • Treatment Selection
  • Target identification
Associated Online Resources:

A Complete Overview

Browser-based, hands-on experience with big data

Learn, practice and gain experience independent of your technological limitations, including:

  • R and Python browser-based data analysis and code review'
  • Process and Analyze large-scale Omics data
  • Create reproducible workflows for data processing, analysis and integration
  • Apply methods to curated datasets from peer-reviewed journals
"It was very informative and easy to learn. The content was concise and the provided articles can be used for deeper understanding. Overall this lesson did a great job at introducing bioinformatics".
- Abubakar Abdulkadir, Postgraduate Student
LSU student
"I enjoyed the lessons and look forward to learning more.It is a great documentation for beginners. For anyone starting afresh, I’d highly recommend these courses. Examples and resources are really useful".
- Wellesley Dittmar, Graduate Student
"The modules were quiet a good opportunity to work with different supervised ML models. As I am not from the statistics or machine learning background but was able to grasp an overview of the same.'.
- Chinmay Dalvi, Senior Research Associate