Biomedical Data Science

The rapid growth of high-throughput data, including -omics technologies, gave rise to significant demand for data science skills and experience with bioinformatics methods of analysis. This online training program will cover practical and conceptual aspects of data science, including data wrangling, statistical analysis, and machine learning in application to high-throughput biomedical omics data using big data analysis tools on the T-BioInfo Platform, R and Python. Throughout the course, students will get an understanding of opportunities and limitations of machine learning in the context of basic pre-clinical and clinical research.

The training is designed as a combination of online resources, practical assignments and live workshops that will be conducted via ZOOM. Throughout the course, we will review several project examples that demonstrate successes and limitations of conventional machine learning (ML) methods and deep learning (DL) using data from public repositories. To learn more, we welcome you to review the information on this page and register for an upcoming webinar where the program topics will be introduced and questions about he training will be answered by our team.

Key Topics Covered:

  Big data, HPC and cloud computing

Many types of omics data require step-by-step preparation, exploration, annotation, and visualization to understand. The T-BioInfo platform was designed for big multi-omics data analysis hiding the complexities of data with a user-friendly and intuitive interface that eliminates the need for coding and advanced machine learning algorithms for data integration and mining.

 NGS: omics data types and use cases

A program that embeds data-driven concepts into biological projects, spanning the student learning journey from observer to participant in research. Project-based learning for big data bioinformatics is to go beyond the theory with real datasets, projects, and expert mentors. Work with curated datasets from publicly available repositories with easy-to-follow tutorials.

 Computational pipelines for data processing

Data processing for Next Generation Sequencing, Mass-Spectroscopy, Structural and phenotypic data. Build and adapt pipelines using similar approaches to data mapping, quantification, and annotation that are used to prepare data for downstream statistical analysis, train machine learning models and annotate features.

 Introduction to Programming: R and Python
Online bioinformatics coding exercises to learn and explore R and Python scripting and understand how to analyze and visualize -omics data to extract meaningful insights from large biological datasets. Learn, practice, and achieve bioinformatics greatness with concise exercises and interesting challenges right in the comfort of your browser!
A Complete Data Science Crash Course with Hands-on Training and curated Case Study Datasets
Principal Component Analysis - Exploratory Data Analysis

 

Data Wrangling and Exploratory Data Analysis

Machine Learning and Statistical Analysis

 

Statistical and Machine Learning Approaches

R studio - Coding Environment for Code Development

 

Coding Environments and Best Practices in R and Python

Research Projects and Case Studies with curated Data Sets

 

Introduction to Case Studies using Research Omics Data

 

Program Syllabus : Data Science for Biomedical Data

Session Title Description

Introduction to the Data Science program

FEBRUARY 04, 2022

Introduction to the program

  • Overview of key topics that will be covered, including:
  • Data loading and preparation
  • Data visualization in R and Python
  • Statistical Concepts and Tests
  • Supervised and Unsupervised Machine Learning
  • Network Analysis and Deep Learning
Associated Online Resources 

Processing high throughput (BIG) data

FEBRUARY 11, 2022

 

Processing high throughput (BIG) data

  • Data complexity and need for preparation:
  • Availability and variability of data
  • Unprecedented detail and volume
  • Data heterogeneity, complexity, and noise
  • Need for interpretability and reproducibility
  • Limitations of statistical analysis
 Associated Online Resources 

Major Types of Machine Learning Methods

FEBRUARY 18, 2022

Major Types of Machine Learning Methods
  • Unsupervised and supervised types of analysis:
  • Dimensionality Reduction, clustering and classification
  • Hierarchical and K-means clustering
  • Big Data Clustering (CLARA, PAM, fuzzy)
  • Conventional Regression-based methods
  • Random Forest and SVM
  • Deep Learning
 Associated Online Resources 

Machine Learning for Data visualization

FEBRUARY 25, 2022

Machine Learning for Data visualization
  • Dimensionality Reduction Objectives
  • Continuous and Ordinal Data 
  • Multi-variate statistical analysis approaches
  • Variance and co-variance
  • Singular Value Decomposition (SVD)
  • Principal Component Analysis, Coordinate Analysis and NMDS tSNE and UMAP
Associated Online Resources 

 

FEBRUARY 25, 2022 at 1 PM CST

LIVE QnA Session

Unsupervised Learning: Clustering

MARCH 04, 2022

Unsupervised Learning: Clustering
  • Patterns and Learning
  • Clustering for data mining
  • PCA and clustering
  • K-means and Hierarchical clustering
  • Big Data Clustering (CLARA, PAM, fuzzy)
  • Clustering objects and features
  • Feature Engineering
Associated Online Resources

Supervised Learning: Clustering

MARCH 11, 2022

Supervised Learning: Clustering
  • Overview of supervised learning
  • Preparing Training and test datasets
  • Binary Decision Trees
  • Random Forest (RF)
  • Support Vector Machine (SVM)
  • Discriminant Analysis: LDA and QDA
  • Model Accuracy and Specificity

Associated Online Resources 

Feature selection and gene signature construction

MARCH 18, 2022

Feature selection and gene signature construction
  • Need for feature selection
  • Feature selection for tSNE
  • Feature selection for biomarker discovery
  • Methods for feature significance
  • Algorithms and approaches (greedy)
  • Technical accuracy (ROC curve and AUC)
  • Logical or biological relevance
  • Automating the process
Associated Online Resources 

Regression and generalized linear models (GLM)

MARCH 21, 2022

Regression and generalized linear models (GLM)

  • Factor Regression Analysis and interpretation
  • Using regression for missing data
  • Data distributions (non-normal)
  • Generalized Linear Models (GLM)
  • Logistic Regression
  • interpretation

Associated Online Resources - 

MARCH 21, 2022 at 1 PM CST

LIVE QnA Session

Network analysis

MARCH 25, 2022

Network analysis
  • Objectives of Network Analysis
  • Signaling and Metabolic Pathways
  • Gene Expression Networks
  • Time-series data
  • Overview of Bayesian Network
  • Examples of Network analysis

Associated Online Resources 

Combining/selecting ML methods

MARCH 28, 2022

Combining/selecting Machine Learning methods
  • Objectives of real-world ML projects:
  • Exploratory Analysis
  • Data Visualization
  • Statistical/Data Mining
  • Predictive Analysis
  • Feature Selection

Associated Online Resources  

Deep Learning: Types & Application

APRIL 01, 2022

Deep Learning: Types & Application
  • Introduction to Deep Learning
  • Models and Approaches
  • Multi-Layer Perceptrons (MLP)
  • Convolutional Neural Networks (CNN)
  • Recurrent Neural Networks (RNN)
  • Dimensionality Reduction with Deep Learning
  • Model Optimization

Associated Online Resources 

Selecting a project for ML analysis

APRIL 08, 2022

Selecting a project for ML analysis
  • Designing a Bioinformatics Research Project
  • Gene Expression (Transcriptomics)
  • Variant Analysis (Genomics)
  • Microbiome Diversity (Metagenomics)
  • Basic Research, Clinical

Associated Online Resources

APRIL 08, 2022 at 1 PM CST

LIVE QnA Session
 

If you need help finalizing registration, contact Farhana Musarrat (email: fmusar1@lsu.edu). You have to register for the orientation using the form below. For LSU or LBRN members, you can complete your registration via BIOMMED iLab link below - 

Register for the Online Mentor Guided Training Program

LBRN certificate

Training Certificate from the Louisiana Biomedical Research Network or LSU BioMMED:

  • Certification of Training Requirement Completion
  • Recognition within the Network and Other IDeA States
  • Advancement for Research, Faculty and Student participants within the LBRN Network
  • Training certification for LSU through the Center for Biotechnology and Biomolecular Medicine at the Louisiana State University

A Complete Overview

Browser-based, hands-on experience with big data

Learn, practice and gain experience independent of your technological limitations, including:

  • R and Python browser-based data analysis and code review'
  • Process and Analyze large-scale Omics data
  • Create reproducible workflows for data processing, analysis and integration
  • Apply methods to curated datasets from peer-reviewed journals
Abubkar
"It was very informative and easy to learn. The content was concise and the provided articles can be used for deeper understanding. Overall this lesson did a great job at introducing bioinformatics".
- Abubakar Abdulkadir, Postgraduate Student
LSU student
"I enjoyed the lessons and look forward to learning more.It is a great documentation for beginners. For anyone starting afresh, I’d highly recommend these courses. Examples and resources are really useful".
- Wellesley Dittmar, Graduate Student
chinmay
"The modules were quiet a good opportunity to work with different supervised ML models. As I am not from the statistics or machine learning background but was able to grasp an overview of the same.'.
- Chinmay Dalvi, Senior Research Associate