falconcodingclub

View project on GitHub

Intro to Data Analytis

Comma Separated Values

Middle school students have all used spreadsheets in some form, but usually we are dealing with other formats when processing data. We use comma separated value files so it is important to give an introduction.

CSV files may or may not have a header, then each line in the text file is a row of data - with each column separated by a comma.

Bob,22
Suzy,18
John,32

Parsing CSV Files

Python has a few different ways of parsing comma separated files. We use the ‘csv’ modeul which has a reader that returns each row as an array.

import csv
with open('input.csv', 'r') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        print('{name} is {age} years old.'.format(name=row[0], age=row[1]))

Public Data

Have the group find some publicly available data. A good place to start is with social security data containing the count of babies names by birth year (only names with over 5 babies per year are in the data set to protect privacy for those with very unusual names). Using concepts we have already tought (loops, conditionals, variables, etc) we were able to read the data. We also learned how to use Python’s csv module to make the code even easier to read. In a few lines of code the students should be able to do some basic things like print out the number of babies named ‘Eleanor’ in each year since 1880!

Advanced Challenge

Next we all started exploring independently – some students worked on figuring out the most and least popular names of all time by summing the total number of babies by name. This will require some thinking, but should be doable with the tools they know.