Classification and Regression Metrics for Predictive modelling analysis

9 min readAug 14, 2022

Topics of concern is health care analytics and data mining. Health care applications and health care data intersected with data science and big data analytics. Understanding algorithms for processing big data.

This article forms a part of a series of articles under the topic Big Data for Health Informatics Course
You can visit the above link to understand this topic in context of the full course. This article references knowledge from Predictive Modelling article, however metrics for classifications methods topic will be covered in a way that can be understood independent of the full course.

Note: This article requires knowledge of machine learning concepts.

Introduction and Recap

In a previous article; Predictive Modelling we discuss how to evaluate the performance of a predictive model. One fundamental concern is the quality of the models developed. To address this we create evaluation metrics such as:

Accuracy: measures how often the classifier correctly predicts. We can define accuracy as the ratio of the number of correct predictions and the total number of predictions.
Sensitivity
Specificity
etc.

A key part of working with big data is to develop multiple beliefs and compare them with evaluation metrics.

To recap we looked at a predictive model pipeline and covered common predictive modelling algorithms; regression and classification.

We will focus on the sixth step which is Performance Evaluation and understand the details of the metrics. Predictive models is a function that map features to prediction a target.

Classification algorithms have target y as either binary or a set of categories. The evaluation metrics include:

True/False positive rate
Positive predictive model
F1
Area under ROC curve

Regression algorithms have the target as a continuous number. Evaluation metrics for regression include

Mean Absolute Error
Mean Squared Error
R squared

Outcomes

We will go through each performance metric mentioned earlier. Provide a definition for them and explain how they are related to each other.

Performance Metrics for Binary Classification Problems

Binary classification is predicting one of two classes. For examples understanding whether a patient will get heart failure or not.

The prediction or the outcome of a binary predictive model can be either positive or negative. Similarly the ground truth values (what actually occurs in reality) can be either positive or negative. Depending on the combination of the prediction outcome and ground truth a contingency table or confusion matrix is formed with the following outcomes:

True positive: This result is seen when the prediction outcome is positive and the ground truth condition is also positive.
False positive: This result is experienced when the prediction is positive but in reality the ground truth condition is negative. This is also known as a type 1 error.
False negative: This is also known as a type 2 error. This occurs when the prediction is negative yet the ground truth condition is positive.
True negative: This when both ground truth and prediction are negative.

Analysing the diagram above to understand the relationships. Each row and column of the matrix sum up to the marginal. What does that mean?
When we add predictive outcome positive to prediction outcome negative together they give us the total population. Similarly when we sum the true positive with the false negative we will get the total value for ground truth condition that is positive and so on….

These are the core metrics that make up binary classification however based on these 4 metrics, we can derive more metrics by taking the ratios of different values

True positive rate = True positive / Condition positive
False positive rate = False positive / Condition negative
False negative rate = False negative / Condition positive
True negative rate = True negative / Condition negative

Accuracy

Accuracy is the most basic metric that is intuitive. It calculated by:
Accuracy = ( True positive + True negative ) / Total population

Note Accuracy metrics are normalisation of the ground truth values

However this is not the best metric if the class labels are imbalanced. Assume we have a 1% of the total population have heart failure with a trivial model we can simply predict everyone does not have heart failure and the accuracy will be 99%.

Sensitivity

With many names; True Positive Rate/ Sensitivity or Recall is another important metric. To compute this we can:
Sensitivity = True positive / Condition positive.

Let us assume positive means heart failure and negative means no heart failure. In order to measure among all people in the population with heart failure (that is true positive) what percentage is correctly identified by the model.

We desire a high rate here, this indicates that the model is close to correct.

False Negative Rate

A related metric that be derived in an alternative way namely:
False negative rate = 1- True positive

This metric lets us know from all the patients that have the condition based on the ground truth, how many of them did the predictor misclassified as not having the condition.

We want this metric to be low as this indicated less misclassifications.

False Positive Rate

The intuition behind this metic is to measure among all patients without heart failure what percentage of them are incorrectly predicted by the model to have heart failure.

This should be a low as we want more correct predictions.

Specificity or True Negative Rate

This can also be calculated by subtracting 1 from the False positive rate. This metric tells us the number of patients without heart failure that the model predicted correctly. We want this metric to be higher.

Prevalence

Accuracy is a metric which is a normalisation of the ground truth metrics whereas prevalence is defined by some commodity divided by the prediction outcomes.

Prevalence is the ratio between condition positive and total population. Prevalence measures how likely the disease occurs in the total population and this value can differ based on varying disease conditions. For example heart failure among older population might be higher when compared to a younger population.

Positive predictive value (precision) = True positive / Prediction outcome positive
False omission rate = False negative / Prediction outcome negative
False discovery rate = False positive / Prediction outcome positive
Negative predictive value = True negative / Prediction outcome negative

Positive Predictive Value or Precision

This is also referred to as Precision. Positive predictive value looks at among all patients predicted to have heart failure, what percentage of them will actually have heart failure.

In other words all those patients correctly classified over the patients from the population which received a positive prediction. This should be a high value since we aim to predict positive outcome for all those ground truth conditions that are positive.

False Discover Rate

This metric is related to precision, false discovery rate + precision = 1.

What ratio of the population has the predictor provided a positive outcome as they have been falsely identified as positive cases despite being negative by the ground truth.
This rate indicates that the predictor discovered them as positive however they are false discoveries.

If this value is low, this means the predictor is good at predicting positive outcomes.

False Omission Rate

Omission means to leave out or exclude — This metric gives us insight in to the percentage of patients who will develop heart failure yet the model classified them as negative.

This metric is desirable when low — meaning the predictor did not omit any positive cases.

Negative Predictive Value

This is the rate of the population who were correctly classified as negative outcomes. This is desirable as a higher metric.

A good predictive model will have high positive predictive value and a high negative predictive value. These should be close to or equal the ground truth positive and negative values respectively.

F1 Score

It combines the the:

True Positive Rate = True Positive / Condition positive
and the Positive Predictive Value = True Positive / Prediction outcome positive

The formula outlined in the below diagram. High F1 scores are desirable.

Receiver Operator Characteristics

In general predictive models output continuous prediction scores. In order to determine what range of continuous scores should be classified as 1 or 0 we need to define a threshold as the prediction boundary. This threshold has significant impact on all the performance metrics. So how do we define this threshold value?

The Receiver Operating Characteristic (ROC) curve or Area Under Curve (AUC) provide a way to compare different classifiers as a prediction boundary.

This curve is created by plotting the true positive rate against the false positive rate. We can achieve this by ordering the prediction score as seen below in a descending order, and then using the prediction scores as potential threshold values.

In the below example we have 20 patients, 10 of them have a positive and negative outcome indicated by p and n respectively.

Graph with each point representing a potential threshold value.

We sort in descending order; the prediction score and consider each as a threshold value. The p values are plotted on the y axis that is: true positive rate and the n values are plotted on the x axis as the false positive rate.

When we plot these values we form a line graph at each peak, these are the points we consider as a threshold value. How do we select the best value?Well it depends on what you are trying to achieve.

A: If you want false positive rate to be low (0.8)
B: If you care for both rates equally (0.54)
C: For high True positive rate (0.38)
D: For high True positive rate (0.30)

The optimal threshold value may vary depending on the threshold.

Regression Metrics

The two popular regression performance metrics are:

MAE (Mean Absolute Error): Measures the average of the absolute errors. This is the absolute difference between the prediction and the ground truth value. MAE is more robust against outliers, however it is more difficult to work with because this absolute value is not differentiable.
MSE (Mean Square Error): Measures the the average of the squared error — this is easier to work with as the derivative of the square term is linear. MSE are greatly affected by outliers because of the square term.

Both these metrics are not bounded in a fixed range, this makes it impossible to compare across datasets.

R squared: This is another regression metric. This has a fixed maximum score of 1. R squared is also referred to as coefficient of determination. This is 1 minus the ratio between MSE and variance.

R squared, MSE plotted and Variance plotted

In the above diagram we can see an example of a linear regression model with MSE = 0.86 and variance = 4.907. In this example the R squared = 0.82

This is considered to be a good value. R squared equal to 1 indicates that the regression perfectly fits the data while 0 indicate that the line does not fit the data as well.

It is possible to have negative values of R squared. This means the predictive model performed worse than a simple average over the data.

Quick Reminder: full course summary can be found on Big Data for Health Informatics Course

Hope you learned something.

-R