Big Data for Health Informatics Course outline

Rezana Dowra
4 min readAug 4, 2022

--

Topics of concern is health care analytics and data mining. Health care applications and health care data intersected with data science and big data analytics. Understanding algorithms for processing big data.

This article include the summaries from the course Big Data for Health Informatics at GaTech (this course is a part of the Machine Learning Specialisation)
It is intended for you to follow the series of articles. At the end of the series you should achieve the following learning goals:

  1. understanding health care data
  2. understanding different analytic algorithms
  3. understanding big data systems

The learning goals will allow you build models on health care. For example models for individual disease risk predictions, recommending treatments, cluster patients into groups with common characteristics and find similar patients.

Introduction

Background on the health care industry in US

Healthcare industry is huge, the overall spending is 3.8 Trillion USD.
This includes massive waste — this is estimated at 764 Billion USD. Apart from the financial loss there are massive problems with quality of health care that result in loss of life.

The four vs in big data for healthcare systems

  1. Volume
  2. Variety
  3. Velocity — data is coming in in real time
  4. Veracity- a lot of noise, errors, missing data, false alarms

Big Data in healthcare

Healthcare generates huge amounts of data. For example each human genome requires 200 GB of raw data, for medical images a single fRMI is 300 GBs. Medical data was estimated to be 100 Petabytes and this continues to grow.
There is also a lot of clinical administration data generated as well. Data from checkups and on body sensors like smart devices etc.

The huge variety of data make it difficult for data scientist to find patterns in data and help patients.

The Data Scientist

What skills do Data scientist need:

  1. Maths and statistics
  2. Domain knowledge and skills
  3. Programming and Databases
  4. Communication and Visualisation

Course Overview

Topics include: Big data applications, algorithms that is used, software systems and are built

This is the full course structure.

Healthcare applications

  1. Predictive modelling- is about using historic data to make future prediction outcome
  2. Computational phenotyping- turning messy electronic health records into meaningful clinic concepts
  3. Patient similarity- uses health data to cluster and group patients

Predictive modelling — The challenges faced:

  1. We have millions of patient data + each of their diagnosis information + medication information + …
  2. There are so many models to be built, this is not a single algorithm, it is a sequence of computational tasks — this is a pipeline with many options which spawn many other pipelines to be compared

Computational Phenotyping — This is raw patient data it consists of:

  1. Demographic information
  2. Diagnosis
  3. Medication
  4. Clinical notes
  5. Procedures
  6. Lab test
  7. …. patient medical history

Phenotyping is when we convert the above raw patient data into medical concepts (phenotypes)

Example of how this is done could be looking at a phenotyping algorithm for type 2 diabetes

EHR: Electronic house record of a patient

Logical workflow for diagnosing a patient with type 2 diabetes

When you follow the above flow, you may enquire about why there are so many checks in place on the patient record. Why cant we just query to see if the patient has type 2 diabetes. The reason for this extensive and complicated workflow is because of the lack of quality of the data in the patient record. These checks cater for errors in the data.

Patient Similarity- Recap this is grouping patients with similar characteristics.

This is case base reasoning where the doctor will look at previous patients and then groups them accordingly.
If a doctor does this manually each doctor will only have a view of their patients.It would be better to add the patient to a global database and expand the group to patients seen by any doctor

Big Data Algorithms

  1. Classification- labelling data based on their features
  2. Clustering- grouping data with similar features
  3. Dimensionality Reduction- reduce the feature set to include the features that are important for the predictions
  4. Graph analysis- create a network of patient and diseases and how they relate to each other.

Big Data Systems

We need big data systems to handle big data:

  1. Hadoop — distributed disk-based big data system
  2. Spark- distributed in memory data system

Course note summaries for each topic covered in the lessons:

  1. Predictive Modelling
  2. MapReduce
  3. Classification and Regression Metrics for Predictive modelling analysis
  4. Ensemble methods
  5. Computational Phenotyping
  6. Gradient Descent
  7. Clustering
  8. Spark
  9. Graph Analysis
  10. Deep Neural Networks

Hope you learned something.

-R

--

--