Data science for Biomedical Research – 1

The rapid growth of high-throughput data, including -omics technologies, gave rise to significant demand for data science skills and experience with bioinformatics methods of analysis. This online training program will cover practical and conceptual aspects of data science, including data wrangling, statistical analysis, and machine learning in application to high-throughput biomedical omics data using big data analysis tools on the T-BioInfo Platform, R and Python. Throughout the course, students will get an understanding of the opportunities and limitations of machine learning in the context of basic pre-clinical and clinical research.

The training is designed as a combination of online resources, practical assignments, and live workshops that will be conducted via ZOOM. Throughout the course, we will review several project examples that demonstrate the successes and limitations of conventional machine learning (ML) methods and deep learning (DL) using data from public repositories. To learn more, we welcome you to review the information on this page and register for an upcoming webinar where the program topics will be introduced and questions about the training will be answered by our team.

Pre - Register for the Program

Data science for Biomedical Research - Flyer

Insights from Webinar Conducted 

Overview of Data Science for Biomedical Data Analysis Training Program 

Dr.Raghavendran sheds light on building machine learning model for biomedical data

Types of Data Analysis" integrated in the "Data Science for Biomedical Data Program

Register for the Upcoming Webinar on August 02, 2022

Key Topics Covered:

  Big data, HPC and cloud computing

Many types of omics data require step-by-step preparation, exploration, annotation, and visualization to understand. The T-BioInfo platform was designed for big multi-omics data analysis hiding the complexities of data with a user-friendly and intuitive interface that eliminates the need for coding and advanced machine learning algorithms for data integration and mining.

 NGS: omics data types and use cases

A program that embeds data-driven concepts into biological projects, spanning the student learning journey from observer to participant in research. Project-based learning for big data bioinformatics is to go beyond the theory with real datasets, projects, and expert mentors. Work with curated datasets from publicly available repositories with easy-to-follow tutorials.

 Computational pipelines for data processing

Data processing for Next Generation Sequencing, Mass-Spectroscopy, Structural and phenotypic data. Build and adapt pipelines using similar approaches to data mapping, quantification, and annotation that are used to prepare data for downstream statistical analysis, train machine learning models and annotate features.

 Introduction to Programming: R and Python
Online bioinformatics coding exercises to learn and explore R and Python scripting and understand how to analyze and visualize -omics data to extract meaningful insights from large biological datasets. Learn, practice, and achieve bioinformatics greatness with concise exercises and interesting challenges right in the comfort of your browser!

What will you learn to do in Python?

Coding challenges:

  • Loading data from csv, txt or xlsx sources and converting it to various data structures (dataframe, matrix, lists and vectors)
  • Summarizing categorical and continuous datasets
  • Data preparation using log-normal transformation and quantile normalization 
  • Statistical tests and outputs (p-value, t-value, standard error, FDR, logFC)
  • Popular packages like pandas, numpy, and sklearn
  • Visualization using matplotlib and seaborn
  • Reading, understanding and loading code examples

Organizing your scripts with comments and functions

  • Setting up a development environment (IDE)
  • Dealing with errors and troubleshooting code (debugging)
  • Preparing data summaries and submitting curated data and meta-data tables to sharing repositories (FAIR principles)
  • Sharing your analysis in jupyter notebooks, on github or google colab
  • Creating interactive visualization in plot.ly

Intended Outcomes and Learning Objectives:

  • Understanding of analytical methods for processing, visualization, and analysis of complex biomedical data
  • Learning terminology for machine learning and artificial intelligence in biomedical discovery
  • Becoming familiar with project examples where ML was used effectively to achieve meaningful results
  • Hands-on practice in the application of standard unsupervised and supervised learning methods to various types of data, such as genomic, transcriptomic, metagenomics, imaging, and clinical
  • Understand the ML taxonomy and the commonly used machine learning algorithms for analyzing “omics” data
  • Understand differences between ML algorithms categories and to which kind of problem they can be applied to
  • Understand different applications of ML in application to different -omics studies and project design objectives
  • Use popular Python packages for data visualization, analysis, and ML
  • Interpret and visualize the results obtained from ML analyses on omics datasets
  • Apply the ML techniques to analyze the public domain or its datasets.
A Complete Data Science Crash Course with Hands-on Training and curated Case Study Datasets
Principal Component Analysis - Exploratory Data Analysis

 

Data Wrangling and Exploratory Data Analysis

Machine Learning and Statistical Analysis

 

Statistical and Machine Learning Approaches

R studio - Coding Environment for Code Development

 

Coding Environments and Best Practices in R and Python

Research Projects and Case Studies with curated Data Sets

 

Introduction to Case Studies using Research Omics Data

 

Program Syllabus : Data Science for Biomedical Data

SESSION TITLE DESCRIPTION
Data Science for Biomedical Data

Introduction to the program

  • Overview of key topics that will be covered, including:
  • Data loading and preparation
  • Data visualization in R and Python
  • Statistical Concepts and Tests
  • Supervised and Unsupervised Machine Learning
  • Network Analysis and Deep Learning
       Associated Online Resources: 
The need for Machine Learning and Statistics: Processing high throughput data

Processing high throughput (BIG) data

  • Data complexity and need for preparation
  • Availability and variability of data
  • Unprecedented detail and volume
  • Data heterogeneity, complexity, and noise
  • Need for interpretability and reproducibility
  • Limitations of statistical analysis
          Associated Online Resources:
Machine Learning Methods

Major Types of Machine Learning Methods

  • Unsupervised and supervised types of analysis
  • Dimensionality Reduction, clustering and classification
  • Hierarchical and K-means clustering
  • Big Data Clustering (CLARA, PAM, fuzzy)
  • Conventional Regression-based methods
  • Random Forest and SVM
  • Deep Learning
Associated Online Resources:
Data Visualization Machine Learning for Data visualization
  • Dimensionality Reduction Objectives
  • Continuous and Ordinal Data 
  • Multi-variate statistical analysis approaches
  • Variance and co-variance
  • Singular Value Decomposition (SVD)
  • Principal Component Analysis, Coordinate Analysis and NMDS tSNE and UMAP
Associated Online Resources: 

 

Unsupervised Learning Unsupervised Learning: Clustering
  • Patterns and Learning
  • Clustering for data mining
  • PCA and clustering
  • K-means and Hierarchical clustering
  • Big Data Clustering (CLARA, PAM, fuzzy)
  • Clustering objects and features
  • Feature Engineering
Associated Online Resources:
Supervised Learning Supervised Learning: Clustering
  • Overview of supervised learning
  • Preparing Training and test datasets
  • Binary Decision Trees
  • Random Forest (RF)
  • Support Vector Machine (SVM)
  • Discriminant Analysis: LDA and QDA
  • Model Accuracy and Specificity

Associated Online Resources

Feature Selection and Gene Signature Construction Feature selection and gene signature construction
  • Need for feature selection
  • Feature selection for tSNE
  • Feature selection for biomarker discovery
  • Methods for feature significance
  • Algorithms and approaches (greedy)
  • Technical accuracy (ROC curve and AUC)
  • Logical or biological relevance
  • Automating the process
Associated Online Resources: 
Regression, Generalized linear modls

Regression and generalized linear models (GLM)

    • Factor Regression Analysis and interpretation
    • Using regression for missing data
    • Data distributions (non-normal)
    • Generalized Linear Models (GLM)
    • Logistic Regression
    • Interpretation

          Associated Online Resources:

Network Analysis Network analysis
  • Objectives of Network Analysis
  • Signaling and Metabolic Pathways
  • Gene Expression Networks
  • Time-series data
  • Overview of Bayesian Network
  • Examples of Network analysis

Associated Online Resources:

Combining machine learning methods for real world applications Combining/selecting ML methods
  • Objectives of real-world ML projects:
  • Exploratory Analysis
  • Data Visualization
  • Statistical/Data Mining
  • Predictive Analysis
  • Feature Selection

Associated Online Resources

Deep Learning: Types and Application Deep Learning: Types & Application
  • Introduction to Deep Learning
  • Models and Approaches
  • Multi-Layer Perceptrons (MLP)
  • Convolutional Neural Networks (CNN)
  • Recurrent Neural Networks (RNN)
  • Dimensionality Reduction with Deep Learning
  • Model Optimization

Associated Online Resources:

Selecting a data science project for analysis Selecting a project for ML analysis
  • Designing a Bioinformatics Research Project
  • Gene Expression (Transcriptomics)
  • Variant Analysis (Genomics)
  • Microbiome Diversity (Metagenomics)
  • Basic Research, Clinical

Associated Online Resources:

Use cases in clinical applications Use-cases in clinical applications
  • Patient Derived Xenografts (PDX)
  • Modeling Precision Medicine for Breast Cancer
  • Biomarker Selection in Alzheimer's Disease
  • Relationship between Geography and Genomic Variation

Associated Online Resources: 

Data Science use cases in industry applications

 Use-cases in industry applications

  • TCGA Liver Cancer Project
  • Microbiome of Skin and gut
  • Treatment Selection
  • Target identification
Associated Online Resources:
 

A Complete Overview

Browser-based, hands-on experience with big data

Learn, practice and gain experience independent of your technological limitations, including:

  • R and Python browser-based data analysis and code review'
  • Process and Analyze large-scale Omics data
  • Create reproducible workflows for data processing, analysis and integration
  • Apply methods to curated datasets from peer-reviewed journals

Customer Reviews on Data Science for Biomedical Data Program

"Multi-Omics Data Analysis is like a "Black Hole", no one exactly know whats going on and its very hard to find an integrated platform to learn various omics types and streamline the pipelines. The great resources provided by the OMICSLOGIC & Tauber Bioinformatics Research Platform is amazing and provide a great platform to brainstorm with the mentors and get your questions answered".

Goutham Vasam
Postdoctoral Fellow, University of Ottawa, Canada

"I am very glad to join the program. Surprised with the quality of the content & to have something very specific for this area of Bioinformatics & Data Science.
It was a very great experience. Along with the course content I can not describe how important Dr.Raghavendran's teachings have now become for my career and project ahead".

Davi Ludvig
Postdoctoral Fellow, University of Texas-Austin, U.S.A

"Thank you for the opportunity ! The OMICSLOGIC platform & the T-Bioinfo Server is a great boon for even the low income countries as we get to learn about research. I enjoy learning the courses on the OMICSLOGIC Learn platform. The language is pretty easy to interpret. I am an Undergraduate student and feel previliged to be in a community of PhDs and Post docs and learn about their research and gain knowledge from them".

Kenitimi Bikikoro
Undergraduate, Nigeria Delta University Amassama, Nigeria

"I didn't have much knowledge about programming languages before especially R and Python but after going through the resources.I have a better understanding now and I think for a biologist, everything on the OMICSLOGIC Learn portal about Multi-Omics data analysis using Data Science techniques has been put together very nicely".

Sneha Pathak
Ph.D. Research Scholar NMIMS University, Mumbai, India