Data and Visual Analytics — Visualisation 101
This article serves as my personal notes for the course CSE 6242 Data and Visual Analytics taken at Georgia Tech University (GaTech) during Spring 2023.
This course will introduce you to broad classes of techniques and tools for analysing and visualising data at scale.
Its emphasis is on how to complement computation and visualisation to perform effective analysis. We will cover methods from each side, and hybrid ones that combine the best of both worlds.
The lesson is about Visualisation. You can find all lessons here.
Introduction
This topic is about information visualisation or infovis and how it helps data analytics.
This article will cover:
- What is information visualisation
- Why it is important
- How is it related to human visual perception and psychology
- The fundamentals of designing effective charts
- How to use colors effectively in visualisation
Information Visualisation
Definition:
Information visualisation can help communication and Exploratory Data Analysis (EDA). Visualisation helps makes sense of data, especially to everyone not working close to it.
Visualisation is important in representing more indepth details about the data. An example of miscommunication using statistics outlined next.
We have a summary of some data and three different visualisations of the same data.
These four trials originate from Anscombe’s Quartet. This is a good illustration that shows, looking at the numbers only do not communicate the true representation of the data.
Human Perception
Data visualisation leverages human perception. To design effective visualisations it is important to understand the basics about human perception.
The human visual perception is the most sensitive of the 5 human sense. It can process the most amount of information per second when compared to our other senses.
In an extremely simplified summary of how the eyes interrupt data there are two stages involved.
The first is a parallel detection, this is when the eyes scan things in parallel, identifying basic features. This is a very rapid stage and is called Pre-Attentive Processing. This occurs naturally and it lasts for a short time.
This happens very fast, approx ~200ms which is the speed that the eye moves.
The second stage is when we look at something in detail and we process the objects. The second stage is rather slow. This is called the Serial Processing stage. This stage incorporates memory.
Colour (hue) and shapes individually are pre-attentively processed individually. However colour and shape together are not pre-attentively processed.
Gestlat Psycology
This started in the early 1900s (Berlin). The goal is understand how do make sense of similarly caotic world. It means how do we see the whole picture all at once instead of a collection of parts.
This psychology principal identified laws of groupings. This grouping allow us to classify objects into the follow groups.
This a good way to understand how we perceive and make sense of the world.
Designing effective charts
How can we design effective charts using the human perception and Gestalt psychology.
Detecting things quickly is not necessarily detecting accurately. Ideally for data representation we want both quick and accurate interpretation. There has been research that look into which style of data representation provides both accurate information in a short time.
The above image show bar charts are very effective in communicating information in a short amount of time. This chart represents data encoded by length.
While we look at charts that represent area, this is not as effective as position type charts.
The above image summarises the concepts of data visualisation design that are better to use. Better to use meaning; can consumers of the visualisation understand accuracy of the data in a short amount of time.
What does this tell us? This tells us that bar charts, line charts and scatter plots are good visual representations of data. They represent length, position, similarity, proximity etc better than other charts.
A good resource for chart principles are Tufte’s Principals. The below are some principles to follow when enforcing graphical integrity.
Colors
How to use colours effectively to communicate with data. How we perceive colour is affected by the context.
The use of colour in visualisations can be used to:
- Call attention to information
- Increase appeal
- Increase memorability
- Another dimension
Color Models
The ways to represent colour on devices are RGB — that is mixing the light of red, green, blue. Another way is HSV that is hue, saturation and lightness.
Selecting Colours
Colour can also be used in a negative way. Such as using colour to represent range of numbers.
It is important to be intentional with your choice of colour. The image below is a guide to using colour based on the type of data.
For example if we have binary information you can use shade. If you have categorical you could use random colors. Now if you have both binary and categorical you could use two depths of the same color for the same category. (That is the mix between binary and categorical)
There is a good website to use when choosing what colour to select for your graphs. It is Colour Brewer
Hope you learned something.
-R