Data and Visual Analytics — SQLite
This article serves as my personal notes for the course CSE 6242 Data and Visual Analytics taken at Georgia Tech University (GaTech) during Spring 2023.
This course will introduce you to broad classes of techniques and tools for analysing and visualising data at scale.
Its emphasis is on how to complement computation and visualisation to perform effective analysis. We will cover methods from each side, and hybrid ones that combine the best of both worlds.
The lesson is about SQLite. You can find all lessons here.
How to effectively store data?
The easiest way to store data is using CSV (Comma separated file). This does not mean this is the easiest to work with. This is because there is too much room for errors. This is not good for including data that contains commas, quotes, double quotes etc.
There exist technologies that are self-contained, server-less and do not require configuration to set up. One good example is SQLite. This is a very popular embedded database.
SQL Refresher Tutorial
This tutorial will allow you to perform basic tasks using SQLite.
To create a table follow the below steps. Open a terminal and begin to enter in the commands.
The output of running .schema will print the table you created.
Next we will add some data to the table using the insert commands. The insert command will take in values for each row to match the type and order of the schema.
To view the newly added data we can read from the table using select. The output should look like a | separated table on your terminal.
Next we want to create course table to record which student has taken which course. Create the new table and then view the schemas with the below commands.
You can go ahead and add some data as shown in the below output using the previously learned commands.
Next we want to perform a join. This will allow us to create the relationship between two tables.
We want to know which students are in the course_id 6242. Using the above command we are able to extract this data. The join statement is the where clause that contains the conditions to do so.
Next we will look at how can we summarise data in the table. We can use some built in keywords such as avg to get the average.
We can also perform further filtering using the having command.
This tutorial summarises some key concepts that allow you to perform important tasks when working with a database.
Beware of missing index's when working with SQLite
SQLite uses B-Trees data structures to represent indices. This is due to the performance on queries. This is O(log n) for searching .
Since data is cannot always be clean or reliable, it is important to check the index of data before you use it.
Hope you learned something.
-R