Python Data Structures: Dictionaries and Sets

In my previous post, I collected my notes on lists and tuples. Here I’ll continue with two more data structures in Python, which are dictionaries and sets.

Dictionary

A data structure that consists of a collection of key-value pairs.

First way to create a dictionary is by using braces.

zoo = {
    'pen_1': 'penguins',
    'pen_2': 'zebras',
    'pen_3': 'lions',
    }
zoo['pen_2']
'zebras'
# We cannot access a dictionary's values by name using bracket indexing
# because the computer interprets this as a key, not a value.

zoo['zebras']
KeyError: 'zebras'

Second way to create a dictionary is by using the dict() function.

zoo = dict(
    pen_1='monkeys',
    pen_2='zebras',
    pen_3='lions',
    )
zoo['pen_2']
'zebras'

When the keys are strings, we can type them as keyword arguments. That’s why we didn’t use quotation marks here. Also, instead of using a colon between the keys and their values, we use an equal sign

The dict function is also a little more flexible in how it can be used. For example, we can create this same dictionary once again by passing a list of lists as an argument, or even a list of tuples, a tuple of tuples or a tuple of lists

# Another way to create a dictionary using the dict() function.
zoo = dict(
    [
     ['pen_1', 'monkeys'],
     ['pen_2', 'zebras'],
     ['pen_3', 'lions'],
    ]
)

# Assign a new key: value pair to an existing dictionary.
zoo['pen_4'] = 'crocodiles'
zoo
{'pen_1': 'monkeys',
'pen_2': 'zebras',
 'pen_3': 'lions',
 'pen_4': 'crocodiles'}

Two important notes about dictionaries:

1- A dictionary’s keys must be immutable. Immutable keys include, but are not limited to, integers, floats, tuples and strings. Lists, sets, and other dictionaries are not included in this category, since they are mutable. 

2- Dictionaries are unordered. It means that we can’t access it by referencing a positional index.

# Dictionaries are unordered and do not support numerical indexing.
zoo[2]
KeyError: 2
# Use the 'in' keyword to produce a Boolean of 
# whether a given key exists in a dictionary.
print('pen_1' in zoo)
print('pen_7' in zoo)
True
False

Dictionary Methods

Let’s create a list of tuples, representing a basketball team.

team = [
    ('Marta', 20, 'center'),
    ('Ana', 22, 'point guard'),
    ('Gabi', 22, 'shooting guard'),
    ('Luz', 21, 'power forward'),
    ('Lorena', 19, 'small forward'),
    ('Sandra', 19, 'center'),
    ('Mari', 18, 'point guard'),
    ('Esme', 18, 'shooting guard'),
    ('Lin', 18, 'power forward'),
    ('Sol', 19, 'small forward'),
    ]

Imagine, we need to find out the players for each position. Now we can use a dictionary.

# Instantiate an empty dictionary.
new_team = {}

# Loop over the tuples in the list of players and unpack their values.
for name, age, position in team:
    if position in new_team:                    # If position already a key in new_team,
        new_team[position].append((name, age))  # append (name, age) tup to list at that value.
    else:
        new_team[position] = [(name, age)]      # If position not a key in new_team,
                                                # create a new key whose value is a list
                                                # containing (name, age) tup.
new_team
{'center': [('Marta', 20), ('Sandra', 19)],
'point guard': [('Ana', 22), ('Mari', 18)],
'shooting guard': [('Gabi', 22), ('Esme', 18)],
'power forward': [('Luz', 21), ('Lin', 18)],
'small forward': [('Lorena', 19), ('Sol', 19)]}
# Examine the value at the 'point guard' key.
new_team['point guard']
[('Ana', 22), ('Mari', 18)]

Methods to retrieve keys and values

# We can access a dictionary's keys by looping over them.
for x in new_team:
    print(x)
center
point guard
shooting guard
power forward
small forward

Or we can use methods to retrieve keys and/or values.

# The keys() method returns the keys of a dictionary.
new_team.keys()
dict_keys(['center', 'point guard', 'shooting guard', 'power forward', 'small forward'])
# The values() method returns all the values in a dictionary.
new_team.values()
dict_values([[('Marta', 20), ('Sandra', 19)], [('Ana', 22), ('Mari', 18)], [('Gabi', 22), ('Esme', 18)], [('Luz', 21), ('Lin', 18)], [('Lorena', 19), ('Sol', 19)]])
# The items() method returns both the keys and the values.
new_team.items()
dict_items([('center', [('Marta', 20), ('Sandra', 19)]), ('point guard', [('Ana', 22), ('Mari', 18)]), ('shooting guard', [('Gabi', 22), ('Esme', 18)]), ('power forward', [('Luz', 21), ('Lin', 18)]), ('small forward', [('Lorena', 19), ('Sol', 19)])])

But the above one is hard to read. We can use a for loop to tidy it up.

for a, b in new_team.items():
    print(a, b)
center [('Marta', 20), ('Sandra', 19)]
point guard [('Ana', 22), ('Mari', 18)]
shooting guard [('Gabi', 22), ('Esme', 18)]
power forward [('Luz', 21), ('Lin', 18)]
small forward [('Lorena', 19), ('Sol', 19)]

The output is prettier now.


Set

A data structure in Python that contains only unordered, non-interchangeable elements.

Each set element is unique and immutable. However, the set itself is mutable. Sets are valuable when storing mixed data in a single row, or a record, in a data table. They’re also frequently used when storing a lot of elements, and we want to be certain that each one is only present once. Because sets are mutable, they cannot be used as keys in a dictionary. 

The first way to create a set is by using the set() function.

# The set() function converts a list to a set.
x = set(['foo', 'bar', 'baz', 'foo'])
print(x)
{'baz', 'bar', 'foo'}
# The set() function converts a tuple to a set.
x = set(('foo','bar','baz', 'foo'))
print(x)
{'baz', 'bar', 'foo'}
# The set() function converts a string to a set.
x = set('foo')
print(x)
{'f', 'o'}

Notice that it doesn’t return the string, just the singular occurrence of the letters in the string, O and F, in an unordered way. This is because the set function accepts a single argument, and that argument must be iterable

The second way to create a set is by using braces. But, we have to put something inside the braces. Otherwise, the computer will interpret our empty braces as a dictionary.

# We can use braces to instantiate a set
x = {'foo'}
print(type(x))

# But empty braces are reserved for dictionaries.
y = {}
print(type(y))
<class 'set'>
<class 'dict'>
# Instantiating a set with braces treats the contents as literals.
x = {'foo'}
print(x)
{'foo'}

To define an empty or a new set, it is best to use the set() function.

Because the elements inside a set are immutable, a set cannot be indexed or sliced

Set Methods

Python provides built-in methods for performing each of the following functions. 

# The intersection() method or the ampersand (&) operator 
# returns common elements between two sets.
set1 = {1, 2, 3, 4, 5, 6}
set2 = {4, 5, 6, 7, 8, 9}
print(set1.intersection(set2))
print(set1 & set2)
{4, 5, 6}
{4, 5, 6}
# The union() method or the union (|) operator symbol 
# returns all the elements from two sets, 
# each represented once.
x1 = {'foo', 'bar', 'baz'}
x2 = {'baz', 'qux', 'quux'}
print(x1.union(x2))
print(x1 | x2)
{'baz', 'bar', 'foo', 'quux', 'qux'}
{'baz', 'bar', 'foo', 'quux', 'qux'}

Union is a communicable operation in mathematics, so the overlapping values will be the same no matter what order you put your variables in. The difference operation (below) on sets, however, is not a communicable operation. Just like in math, if we subtract four from seven, we get a different result than if we subtract seven from four.

# The difference() method or the minus (-) operator 
# returns the elements in set1 that aren't in set2
set1 = {1, 2, 3, 4, 5, 6}
set2 = {4, 5, 6, 7, 8, 9}
print(set1.difference(set2))
print(set1 - set2)
{1, 2, 3}
{1, 2, 3}
# ... and the elements in set2 that aren't in set1.
print(set2.difference(set1))
print(set2 - set1)
{8, 9, 7}
{8, 9, 7}

To get around this and observe the difference between two sets mutually, we can use the symmetric difference function.

# The symmetric_difference() method or the symmetric difference operator, 
# expressed by a caret (^) returns all the values from each set 
# that are not in both sets.
set1 = {1, 2, 3, 4, 5, 6}
set2 = {4, 5, 6, 7, 8, 9}
set2.symmetric_difference(set1)
set1 ^ set2
{1, 2, 3, 7, 8, 9}

Notes on set:

  • Set is a mutable data type.
  • Because it’s mutable, this class comes with additional methods to add and remove data from the set.
  • It can be applied to any iterable object and will remove duplicate elements from it.
  • It is unordered and non-indexable. 
  • Elements in a set must be hashable; generally, this means they must be immutable. 

Notes on frozenset:

  • Frozenset is an immutable data type.
  • It can be applied to any iterable object and will remove duplicate elements from it.
  • Because they’re immutable, frozensets can be used as dictionary keys and as elements in other sets.