More Samples with Dictionaries and Sets

In my previous post, I covered some introductory information on dictionaries and sets. Here I’ll provide more samples with them.

Let’s imagine we have a table of information with states and counties, like we had before. But this time the table holds one more variable, that is AQI (air quality index) records of each county. A sample of this table would be like the one below.

state_namecounty_nameaqi
ArizonaMaricopa9
CaliforniaAlameda11
CaliforniaSacramento35
KentuckyJefferson6
LouisianaEast Baton Rouge5

We’ll follow four main steps:

  • First, we’ll create a dictionary to store information
  • Then we will use that dictionary to retrieve information
  • Then we’ll write a function to be able to quickly look up how many times a county is represented
  • And finally, we’ll use sets to check how many counties share names

Create a dictionary to store information 

Dictionaries are useful when we need a data structure to store information that can be referenced or looked up.

Create a list of tuples

We’ll convert each of these columns to a list of tuples, and then assign the result to a variable called epa_tuples.

# This cell will import our data
import ada_c2_labs as lab

state_list = lab.fetch_epa('state')
county_list = lab.fetch_epa('county')
aqi_list = lab.fetch_epa('aqi')
epa_tuples = list(zip(state_list, county_list, aqi_list))

Create a dictionary

Now let’s create a dictionary that allows us to look up a state and get all the county-AQI pairs associated with that state.

aqi_dict = {}
for state, county, aqi in epa_tuples:
    if state in aqi_dict:
        aqi_dict[state].append((county, aqi))
    else:
        aqi_dict[state] = [(county, aqi)]
aqi_dict['Vermont']
[('Chittenden', 18.0),
('Chittenden', 20.0),
('Chittenden', 3.0),
('Chittenden', 49.0),
('Rutland', 15.0),
('Chittenden', 3.0),
('Chittenden', 6.0),
('Rutland', 3.0),
('Rutland', 6.0),
('Chittenden', 5.0),
('Chittenden', 2.0)]

Use the dictionary to retrieve information 

Now that we have a dictionary of county-AQI readings by state, we can use it to retrieve information and draw further insight from our data.

Calculate how many readings were recorded in the state of Arizona

len(aqi_dict['Arizona'])
72

Calculate the mean AQI from the state of California

ca_aqi_list = [aqi for county, aqi in aqi_dict['California']]
ca_aqi_mean = sum(ca_aqi_list) / len(ca_aqi_list)
ca_aqi_mean
9.412280701754385

Define a county_counter() function 

Imagine that we want to be able to quickly look up how many times a county is represented in a given state’s readings. Even though we already have a list containing just county names, it’s not safe to rely on the counts from that list alone because some states might have counties with the same name. Therefore, we’ll need to use the state-specific information in aqi_dict to calculate this information.

Write the function

We’d like to have an output like this:

[IN]  county_counter('Florida')
[OUT] {'Duval': 13,
'Hillsborough': 9,
'Broward': 18,
'Miami-Dade': 15,
'Orange': 6,
'Palm Beach': 5,
'Pinellas': 6,
'Sarasota': 9}
def county_counter(state):
    county_dict = {}
    for county, aqi in aqi_dict[state]:
        if county in county_dict:
            county_dict[county] +=1
        else:
            county_dict[county] = 1
    return county_dict

Use the function to check Washington County, PA

pa_dict = county_counter('Pennsylvania')
pa_dict['Washington']
7

Use the function to check the different counties in Indiana

county_counter('Indiana').keys()
dict_keys(['Marion', 'St. Joseph', 'Vanderburgh', 'Allen', 'Vigo', 'Hendricks', 'Lake'])

Use sets to determine how many counties share names

Let’s create a list of every county from every state, then use it to determine how many counties have the same name.

Construct a list of every county from every state

all_counties = []
for state in aqi_dict.keys():
    counties = list(county_counter(state).keys())
    all_counties += counties

len(all_counties)
277

Calculate how many counties share names

shared_count = 0 
for county in set(all_counties): 
    count = all_counties.count(county)
    if count > 1: 
        shared_count += count
        
shared_count
41

But this above code doesn’t give us how many county names were duplicated.


As a summary:

  • Python has many built-in functions that are useful for building dictionaries and sets.
  • Dictionaries in Python are useful for representing data in terms of keys mapped to values.
  • A set will not allow duplicate values.
    • The values a set contains are unchangeable and unordered.
  • Functions and loop iteration can be used to perform calculations on dictionary values.
    • Once the values have been calculated, they can be saved to other data types, such as tuples, lists, and sets.
  • There are many ways to access data stored inside a dictionary.