Data and Visual Analytics —Data Collection

Rezana Dowra
2 min readJan 11, 2023

--

This article serves as my personal notes for the course CSE 6242 Data and Visual Analytics taken at Georgia Tech University (GaTech) during Spring 2023.

This course will introduce you to broad classes of techniques and tools for analysing and visualising data at scale.

Its emphasis is on how to complement computation and visualisation to perform effective analysis. We will cover methods from each side, and hybrid ones that combine the best of both worlds.

The lesson is about Data Collection. You can find all lessons here.

How can data be collected?

There are many ways to collect data, the three primary ways of collecting data include:

  1. Download — low effort since we could start working with the data immediately.
  2. API — medium effort since you need to write some code
  3. Scrape/Crawl — high effort, this could contain crawling a webpage and then extracting data

There are many data sources available online that allows you to freely download the data and use it.

There are also many open APIs available that we can use to gain data. However when data is public but the data itself is not easy to obtain then we can rely on scraping as a method for data collection.

Scraping/Crawling for Data collection

How do we scrape data, suppose we want to collect some data from google play. You are interested in understanding which applications are related to each other. You want to create a network of apps.

Example of why you may need to scrape the web for data.

To achieve this you could write some code/script to search for an application and read the list of similar applications — these will be related to the first application you searched for.

This psuedo algorithm will allow you to build a graph of representing the relationship of applications in Google Play.

Tools for scrapping a web page

The below image shows some popular tooling such as selenium. This is a a powerful tool that can be used to automate a web-browser.

Many examples shown here are python libraries.

Examples of scraping libraries

There are some considerations for us to keep in mind when deciding to scrap include understanding the hidden component to a web page that require some interaction. Also keep in mind not all web browsers render the content in the exact same way.

Hope you learned something.

-R

--

--