Data and Visual Analytics — Visualisation 101

Rezana Dowra
5 min readFeb 12, 2023

--

This article serves as my personal notes for the course CSE 6242 Data and Visual Analytics taken at Georgia Tech University (GaTech) during Spring 2023.

This course will introduce you to broad classes of techniques and tools for analysing and visualising data at scale.

Its emphasis is on how to complement computation and visualisation to perform effective analysis. We will cover methods from each side, and hybrid ones that combine the best of both worlds.

The lesson is about Visualisation. You can find all lessons here.

Introduction

This topic is about information visualisation or infovis and how it helps data analytics.
This article will cover:

  1. What is information visualisation
  2. Why it is important
  3. How is it related to human visual perception and psychology
  4. The fundamentals of designing effective charts
  5. How to use colors effectively in visualisation

Information Visualisation

Definition:

Definition of information visualisation

Information visualisation can help communication and Exploratory Data Analysis (EDA). Visualisation helps makes sense of data, especially to everyone not working close to it.

Visualisation is important in representing more indepth details about the data. An example of miscommunication using statistics outlined next.
We have a summary of some data and three different visualisations of the same data.

The same statistics can represent various forms of information.

These four trials originate from Anscombe’s Quartet. This is a good illustration that shows, looking at the numbers only do not communicate the true representation of the data.

Human Perception

Data visualisation leverages human perception. To design effective visualisations it is important to understand the basics about human perception.

The speed of processing information per human sense.

The human visual perception is the most sensitive of the 5 human sense. It can process the most amount of information per second when compared to our other senses.

In an extremely simplified summary of how the eyes interrupt data there are two stages involved.

The first is a parallel detection, this is when the eyes scan things in parallel, identifying basic features. This is a very rapid stage and is called Pre-Attentive Processing. This occurs naturally and it lasts for a short time.
This happens very fast, approx ~200ms which is the speed that the eye moves.
The second stage is when we look at something in detail and we process the objects. The second stage is rather slow. This is called the Serial Processing stage. This stage incorporates memory.

Stages of vision

Colour (hue) and shapes individually are pre-attentively processed individually. However colour and shape together are not pre-attentively processed.

Types of concepts

Gestlat Psycology

This started in the early 1900s (Berlin). The goal is understand how do make sense of similarly caotic world. It means how do we see the whole picture all at once instead of a collection of parts.

This psychology principal identified laws of groupings. This grouping allow us to classify objects into the follow groups.

Gestalt groupings

This a good way to understand how we perceive and make sense of the world.

Designing effective charts

How can we design effective charts using the human perception and Gestalt psychology.

Detecting things quickly is not necessarily detecting accurately. Ideally for data representation we want both quick and accurate interpretation. There has been research that look into which style of data representation provides both accurate information in a short time.

Types of charts and how quickly they provide accurate information

The above image show bar charts are very effective in communicating information in a short amount of time. This chart represents data encoded by length.
While we look at charts that represent area, this is not as effective as position type charts.

Most accurate features of a good chart.

The above image summarises the concepts of data visualisation design that are better to use. Better to use meaning; can consumers of the visualisation understand accuracy of the data in a short amount of time.

What does this tell us? This tells us that bar charts, line charts and scatter plots are good visual representations of data. They represent length, position, similarity, proximity etc better than other charts.

A good resource for chart principles are Tufte’s Principals. The below are some principles to follow when enforcing graphical integrity.

Laws of graphical integrity

Colors

How to use colours effectively to communicate with data. How we perceive colour is affected by the context.
The use of colour in visualisations can be used to:

  • Call attention to information
  • Increase appeal
  • Increase memorability
  • Another dimension

Color Models

The ways to represent colour on devices are RGB — that is mixing the light of red, green, blue. Another way is HSV that is hue, saturation and lightness.

Selecting Colours

Colour can also be used in a negative way. Such as using colour to represent range of numbers.

It is important to be intentional with your choice of colour. The image below is a guide to using colour based on the type of data.
For example if we have binary information you can use shade. If you have categorical you could use random colors. Now if you have both binary and categorical you could use two depths of the same color for the same category. (That is the mix between binary and categorical)

Guide to using colour to represent data.

There is a good website to use when choosing what colour to select for your graphs. It is Colour Brewer

Hope you learned something.

-R

--

--