Data and Visual Analytics — Course Introduction
This article serves as my personal notes for the course CSE 6242 Data and Visual Analytics taken at Georgia Tech University (GaTech) during Spring 2023.
This course will introduce you to broad classes of techniques and tools for analysing and visualising data at scale.
Its emphasis is on how to complement computation and visualisation to perform effective analysis. We will cover methods from each side, and hybrid ones that combine the best of both worlds.
Course Introduction
This course will give you the tools to analyse and represent data.
Today we have access to large sets of data, however human beings can only maintain approximately 7 +- 2 items in their working memory. Our goal is to condense these large data sets into valuable, relevant and important things that people can hold in their memories.
We achieve this by transforming data into insights by taking techniques from two approaches which include data mining and human computer interfaces.
Data mining focuses on automatic techniques and they include clustering and classification techniques.
Since they are automatic they can easily scale to millions of items. Human computer interaction helps use understand data in an intuitive way. This focuses on interaction and visualisation techniques.
This course combines computation and human intuition these two areas of focus.
Why data visual analytics?
- The best way to start answering this question is to understand “What is data and visual analytics?” This is an interdisciplinary science combining computation techniques and interactive visualisation to help transform data to help making an important decision or making discovery. Thus the motivation behind this is the ability to make informed decisions or discovering information from the data.
There are a couple of things worth considering when attempting to do data visual analytics. Some challenges including how to store and retrieve data efficiently as well as how to scale algorithms, working with distributed systems how do we perform testing, visualisation etc.
2. More data is being created every day and there is a need for the processing of this data. Especially in fields like medical/ sports/ finance/ marketing etc. There is also a need for these careers
Course goals and expectations
- Learn visual and computational techniques and use them in a complementary way
- Gain a breadth of knowledge
- Learn practical know-how by working on real data and problems.
The course schedule is made up of multiple parts. The parts in green are data collection, cleaning and integration. We then have a blue section representing data analytics and visualisation and finally presentation and dissemination.
These are building blocks as appose to rigid steps. These building blocks can be revisited or some can be skipped depending on the data and your goals.
The course topics
- Course Introduction
- Analytics Building Blocks
- Data Science Buzzwords
- Data Collection
- SQLite
- Data Cleaning
- Code Back-up & Version Control
- Data Integration
- Data Analytics, Concepts and Tasks
- Visualisation 101
- Fixing Common Visualisation Issues
- Data Visualisation for Web (D3)
- Scalable Computing: Hadoop
- Scalable Computing: Pig
- Scalable Computing: Hive
- Scalable Computing: Spark
- Scalable Computing: HBase
- Classification
- Visualisation for Classification
- Introduction to Clustering
- Graph Analytics
- Ensemble Method
- Scaling up Algorithms with Virtual Memory
- Text Analytics
The other topics covered in the course will be posted as I go through the course — the above list should become a list of links.
Hope you learned something.
-R