Data and Visual Analytics — Analytics Building Blocks
This article serves as my personal notes for the course CSE 6242 Data and Visual Analytics taken at Georgia Tech University (GaTech) during Spring 2023.
This course will introduce you to broad classes of techniques and tools for analysing and visualising data at scale.
Its emphasis is on how to complement computation and visualisation to perform effective analysis. We will cover methods from each side, and hybrid ones that combine the best of both worlds.
The lesson is about Analytics building blocks. You can find all lessons here.
Introduction
This course uses building blocks to help us complete analysis and visualisation of big data. The analytic building blocks compose of
- Collection
- Cleaning
- Integration
- Analysis
- Visualisation
- Presentation
- Dissemination
These are not seen as sequential steps but rather blocks and you can perform them in any order, you skip blocks, go back to previous ones etc.
Projects
In order to understand how this can be used. We will go through two project in detailed and showcase how these building blocks can be implemented.
Apolo Graph Exploration
The first example is called Apolo graph exploration. This is a tool used to explore large graphs. This tool uses Machine Learning and Visualisation.
The problem space includes: given a large complex graph of papers and its references. How can we find relevant/related nodes for the user.
For this project the data was collected and cleaned from Google Scholar. The analysis included designing an inference algorithm and then creating a view of the data using an interactive GUI.
The presentation for this body of work includes a paper that was published.
NetProbe
The second example is a project called NetProbe. This is a system that detects fraud on online Auction on EBay.
This project aims to identify non delivery fraud or rather individuals who are not really selling anything.
The way in which this works is to connect a buyers to a sellers. This builds a graph/network where we can connect the relationship between the two.
On EBay each person’s profile contains the total number of positive feedback, minus the negative feedback.
So those who are trying to fraud others create a fraudster account to perform illegal work as well as other accounts to offset the fraudster scoring. The accomplices would trade between honest accounts as well as honest people.
The purpose of the accomplice is to create positive trades and good scoring for the fraudster.
Now we want to detect Fraudsters and we do this by finding the Accomplices. This is called a near bipartite cores.
We can do this by scoring the interactions between the types of people. The darker squares indicate help us flag the imposter accounts.
The building blocks used in this project include collection and cleaning. Which included scraping and cleaning data from EBay.
Since all the data came from Ebay there was no need to do data integration.
There was analysis done to build a detection algorithm. As well as creating a visualisation of the data and finally this project was presented in a paper and then with talks and lecturers.
Hope you learned something.
-R