SQL

SQL stands for “structured query language”. It is a programming language for storing and processing information in a relational database. A relational database stores information in tabular form, with rows and columns representing different data attributes and the various relationships between the data values.

SQL focus on database management. It is very big right now in the private sector, lots and lots of companies (especially the big companies) need to handle very large and numerous and complicated database, and hence a lot of job openings are specifically for people with experience using SQL. In python, there is a module that allows us to make use of c-basic SQL commands and features called “sqlite3”.

SQL files are generally stored with the “.db” extension (db = database). The SQL files are binary files, if we open these files in textbook, we will see a bunch of gibberish codes instead of the actual data. In SQL, a “database” may have multiple “Tables”, and a “Table” may have multiple columns, and columns can have multiple data entries.

Create a database

In order to create a database, we will use the clorox2020.csv and turn it into a database so that we can run SQL commands on it.

def data_to_database():
    data = pandas.read_csv("clorox2020.csv")

    # here we establish a connection to the database
    conn = sqlite3.connect("clorox2020")
    # once we have established a connection, we need to create a cursor object
    # the cursor object is the object that we use to execute SQL commands
    cur = conn.cursor()

    # here we create a table called stock
    sql_command = "CREATE TABLE clorox (Date text, Open float, High float, Low float, Close float, AdjClose float, Volume int)"
    cur.execute(sql_command)

    for i in range(len(data)):
        # insert one row of data into the table per loop
        Date = data["Date"][i]
        Date = "'" + Date + "'"
        Open = data["Open"][i]
        High = data["High"][i]
        Low = data["Low"][i]
        Close = data["Close"][i]
        AdjClose = data["Adj Close"][i]
        Volume = data["Volume"][i]

        sql_command = f"INSERT INTO clorox VALUES ({Date}, {Open}, {High}, {Low}, {Close}, {AdjClose}, {Volume})"
        cur.execute(sql_command)

    # save all the changes and close the database
    conn.commit()
    conn.close()


def display_data():
    conn = sqlite3.connect("clorox2020")
    cur = conn.cursor()
    sql_command = "SELECT * FROM clorox"
    db_columns = cur.execute(sql_command)
    db_entries = db_columns.fetchall()
    for entry in db_entries:
        print(entry)

    conn.close()


data_to_database()
display_data()

Output:

('2020-01-02', 153.380005, 153.410004, 151.550003, 152.610001, 149.262665, 1292500)
('2020-01-03', 152.570007, 153.880005, 151.240005, 152.910004, 149.556076, 809800)
('2020-01-06', 153.070007, 153.940002, 152.289993, 153.369995, 150.005981, 869300)
('2020-01-07', 153.190002, 153.289993, 151.509995, 151.520004, 148.196564, 1198200)
('2020-01-08', 151.779999, 153.339996, 150.949997, 151.970001, 148.636688, 901200)
('2020-01-09', 152.229996, 153.009995, 151.720001, 152.800003, 149.448486, 504900)
......

Notice that in order to interact with a database, we need to make a connection to that database using conncetion() method. And we need to use cursor() method to create a cursor object to execute some SQL commands.

Common SQL commands

Most of the actions we need to perform on a database are done with SQL statements. SQL statements consists of keywords that are easy to understand. SQL commands are all capitalized (although not necessary) to distinguish them from the Python commands or variable names. The most common operations on a database are CRUD: create, read, update, and delete.

1. Create

Syntax: CREATE TABLE <table name> (<cloumn1> <datatype>, <cloumn2> <datatype>, …)

In this command, neither the table name nor column name needs to covered by quotation marks. For the datatype, in SQLite text is for string, float is for floating number, int is for integer number, and etc.

When creating a table, if the table we are creating exists, the creation will raise an error. In order to avoid this error we can add [IF NOT EXISTS] before the table name.

Special note: we can actually create multiple tables in a single database, but the table names should be different.

# suppose we would like to create another table under the database clorox2020

def another_table():
    conn = sqlite3.connect("clorox2020")
    cur = conn.cursor()
    sql_comm = "CREATE TABLE IF NOT EXISTS buyers(buyer_id text, amount int)"
    cur.execute(sql_comm)
    cur.execute("INSERT INTO buyers VALUES ('75c', 13)")
    cur.execute("INSERT INTO buyers VALUES ('16d', -4)")
    cur.execute("INSERT INTO buyers VALUES ('38a', 10)")
    cur.execute("INSERT INTO buyers VALUES ('59c', -9)")
    cur.execute("INSERT INTO buyers VALUES ('66b', 37)")
    cur.execute("INSERT INTO buyers VALUES ('31a', -43)")
    conn.commit()

    cur.execute("SELECT * FROM buyers")
    results = cur.fetchall()
    for result in results:
        print(result)

    conn.close()

Output:

('75c', 13)
('16d', -4)
('38a', 10)
('59c', -9)
('66b', 37)
('31a', -43)

2. Read

Syntax: SELECT <*/column> FROM <table> [WHERE <condition>]

Using this command, we can “read” some data from a table. In column, we can use “*” to select ALL cloumns or type the specific column name to query the data from that columns. We can also select some statistics data like SUM(), AVG(), and etc. In addition, we can use some conditions to somehow filter the data, such as WHERE [column >/< some value], or WHERE [column = some value].

# get the data form the database
def get_some_data():
    conn = sqlite3.connect("clorox2020")
    cur = conn.cursor()
    volume_max = cur.execute("SELECT MAX(Volume) FROM clorox").fetchone()[0]
    volume_min = cur.execute("SELECT MIN(Volume) FROM clorox").fetchone()[0]
    volume_avg = cur.execute("SELECT AVG(Volume) FROM clorox").fetchone()[0]

    print(f"The maximum volume is:{volume_max}")
    print(f"The minimum volume is:{volume_min}")
    print(f"The average volume is:{volume_avg}")

    conn.close()

Output:

The maximum volume is:7478700
The minimum volume is:489500
The average volume is:1776609.865470852

3. Update

Syntax: UPDATE <table> SET <cloumn> = <some value> [WHERE <condition>]

Using this command, we can modify the value of some columns in our table to the value that we want. It’s important to notice that if we don’t use “WHERE” to specify the rows, the whole column will be changed to that value. On the other hand, we can use the “WHERE” command to modify more than one line.

# update the clorox table and double the colume
def update_volume():
    conn = sqlite3.connect("clorox2020")
    cur = conn.cursor()
    
    cur.execute("UPDATE clorox SET Volume = Volume * 2")
    conn.commit()
    conn.close()

4. Delete

Syntax: DELETE FROM <table> [WHERE <condition>]

Using this command, we can “delete” some data from our table. Similar to “UPDATE” command, when no condition is assigned, the whole table will be deleted. Therefore, it’s always a good idea to specify the row that we want to delete using “WHERE” command. And the we can also delete more than one line using the “WHERE” command.

# delete the rows which contains open price >= 200
def delete_open():
    conn = sqlite3.connect("clorox2020")
    cur = conn.cursor()
    cur.execute("DELETE FROM clorox WHERE Open > 200")
    conn.commit()
    conn.close()

Execute the commands

After typing the commands that we would like to use, the next step is executing them. To execute the commands, we need to use cur.execute(sql_command). For the “SELETE” command, we need to use fetchall() to get all the row or fetchone() to get one row. The fetchall() function will return a list object, we can use a for loop to iterate it. The fetchone() function will return a tuple object of the data. For the “CREATE”, “UPDATE”, and “DELETE” commands, we need to use conn.commit() to commit the changes that we make. Otherwise, the changes WILL NOT be saved to the table. In the end of all the command, if we no longer want to interact with the database, we should use conn.close() to close the connection to the database to avoid memory leak.

Leave a Reply

Your email address will not be published. Required fields are marked *