Functions and Methods with Array

In my previous posts I introduced some of the Python libraries and then entered the world of NumPy. Here I’ll take notes on some functions & methods with NumPy and provide more coding samples to perform some data related tasks.

Array functions

Here are some functions that can be used with arrays.

np.zeros()

This creates an array of a designated shape that is pre-filled with zeros.

np.zeros((3, 2))
[[ 0.  0.]
[ 0. 0.]
[ 0. 0.]]

np.ones()

This creates an array of a designated shape that is pre-filled with ones.

np.ones((2, 2))
[[ 1.  1.]
[ 1. 1.]]

np.full()

This creates an array of a designated shape that is pre-filled with a specified value.

np.full((5, 3), 8)
[[ 8.  8.  8.]
[ 8. 8. 8.]
[ 8. 8. 8.]
[ 8. 8. 8.]
[ 8. 8. 8.]]

These functions are useful for various situations:

  • To initialize an array of a specific size and shape, then fill it with values derived from a calculation.
  • To allocate memory for later use. (I’ll give a bit more information at the end)
  • To perform matrix operations.

Array methods

Here are some of the most commonly used methods with NumPy.

ndarray.flatten() 

This returns a copy of the array collapsed into one dimension.

array_2d = np.array([(1, 2, 3), (4, 5, 6)])
print(array_2d)
print()
array_2d.flatten()
[[1 2 3]
[4 5 6]]

[1 2 3 4 5 6]

ndarray.reshape() 

As we did before, this gives a new shape to an array without changing its data.

array_2d = np.array([(1, 2, 3), (4, 5, 6)])
print(array_2d)
print()
array_2d.reshape(3, 2)
[[1 2 3]
[4 5 6]]

[[1 2]
[3 4]
[5 6]]

Adding a value of -1 in the designated new shape makes the process more efficient, as it indicates for NumPy to automatically infer the value based on other given values.

array_2d = np.array([(1, 2, 3), (4, 5, 6)])
print(array_2d)
print()
array_2d.reshape(3, -1)
[[1 2 3]
[4 5 6]]

[[1 2]
[3 4]
[5 6]]

ndarray.tolist() 

This converts an array to a list object. Multidimensional arrays are converted to nested lists.

array_2d = np.array([(1, 2, 3), (4, 5, 6)])
print(array_2d)
print()
array_2d.tolist()
[[1 2 3]
[4 5 6]]

[[1, 2, 3], [4, 5, 6]]

Mathematical functions 

NumPy arrays also have many methods that are mathematical functions.

a = np.array([(1, 2, 3), (4, 5, 6)])
print(a)
print()
print(a.max())
print(a.mean())
print(a.min())
print(a.std())
[[1 2 3]
[4 5 6]]

6
3.5
1
1.70782512766

Of course, there are many more mathematical functions.

# Create new array
arr = np.array([1, 2, 3, 4, 11])

# The log() method returns the natural logarithm of the elements in an array.
print(np.log(arr))

# The floor() method returns the value of a number rounded down to the nearest integer.
print(np.floor(5.7))

# The ceil() method returns the value of a number rounded up to the nearest integer.
print(np.ceil(5.3))
array([0.        , 0.69314718, 1.09861229, 1.38629436, 2.39789527])
5.0
6.0

Indexing and slicing 

We can access individual elements of a NumPy array using indexing and slicing. Indexing in NumPy is similar to indexing in Python lists, except multiple indices can be used to access elements in multidimensional arrays.

a = np.array([(1, 2, 3), (4, 5, 6)])
print(a)
print()

print(a[1])
print(a[0, 1])
print(a[1, 2])
[[1 2 3]
[4 5 6]]

[4 5 6]
2
6

Slicing may also be used to access subarrays of a NumPy array.

a = np.array([(1, 2, 3), (4, 5, 6)])
print(a)
print()

a[:, 1:]
[[1 2 3]
[4 5 6]]

[[2 3]
[5 6]]

Array operations

NumPy arrays support many operations, including mathematical functions and arithmetic. These include array addition and multiplication, which performs element-wise arithmetic on arrays.

a = np.array([(1, 2, 3), (4, 5, 6)])
b = np.array([[1, 2, 3], [1, 2, 3]])
print('a:')
print(a)
print()
print('b:')
print(b)
print()
print('a + b:')
print(a + b)
print()
print('a * b:')
print(a * b)
a:
[[1 2 3]
[4 5 6]]

b:
[[1 2 3]
[1 2 3]]

a + b:
[[2 4 6]
[5 7 9]]

a * b:
[[ 1 4 9]
[ 4 10 18]]

How NumPy arrays store data in memory

I mentioned earlier about storing the data in memory with arrays. Here is some more information about that.

NumPy arrays work by allocating a contiguous block of memory at the time of instantiation. Most other structures in Python don’t do this; their data is scattered across the system’s memory. This is what makes NumPy arrays so fast; all the data is stored together at a particular address in the system’s memory. 

Interestingly, this is also what prevents an array from being lengthened or shortened: The abutting memory is occupied by other information. There’s no room for more data at that memory address. However, existing elements of the array can be replaced with new elements. 

The only way to lengthen an array is to copy the existing array to a new memory address along with the new data. 


More Samples with NumPy

Let’s bring back our aqi table from the previous post. As a reminder, an example is below.

state_namecounty_nameaqi
ArizonaMaricopa9
CaliforniaAlameda11
CaliforniaSacramento35
KentuckyJefferson6
LouisianaEast Baton Rouge5
  1. Create an Array

First we’ll convert the list to an ndarray and print the length and the first five elements of it.

import numpy as np
import ada_c2_labs as lab
aqi_list = lab.fetch_epa('aqi')

aqi_array = np.array(aqi_list)
print(len(aqi_array))
print(aqi_array[:5])
1725
[18. 9. 20. 11. 6.]
  1. Calculate Summary Statistics
print('Max =', np.max(aqi_array))
print('Min =', np.min(aqi_array))
print('Median =', np.median(aqi_array))
print('Std =', np.std(aqi_array))
Max = 93.0
Min = 0.0
Median = 8.0
Std = 10.382982538847708
  1. Calculate percentage of readings with cleanest AQI

We’ll check how many air quality readings in the data represent the cleanest air, which we’ll focus on the readings of 5 or less. Here we can use one of the properties of arrays that make them so powerful: their element-wise operability.

boolean_aqi = (aqi_array <= 5)

percent_under_6 = boolean_aqi.sum() / len(boolean_aqi)
percent_under_6
0.3194202898550725

As a summary:

  • Python packages contain functions to perform specific tasks.
    • The NumPy package has functions used for working with arrays and performing mathematical operations.
  • Arrays are similar to lists, but only store one type of data per array.
    • Processing data stored in an array is much quicker than processing data stored in traditional lists.
  • Arrays are useful for performing element-wise operations, including arithmetic and comparisons.

In