Python features make it possible to extend, enhance, and reuse parts of the code. To access these features, we can import them from libraries, packages, and modules.
Library
Library or Package, broadly refers to a reusable collection of code. It also contains related modules and documentation.
Here are some commonly used libraries for data work:
- Matplotlib: a library for creating static, animated, and interactive visualizations.
- Seaborn: a data visualization library that’s based on matplotlib. It provides a simpler interface for working with common plots and graphs.
- NumPy (Numerical Python): an essential library that contains multidimensional array and matrix data structures and functions to manipulate them. This library is used for scientific computation.
- Pandas (Python Data Analysis): a powerful library built on top of NumPy that’s used to manipulate and analyze tabular data.
Module
Libraries and packages provide sets of modules that are essential for data professionals. Modules are accessed from within a package or a library. They are Python files that contain collections of functions and global variables.
When we import a module, we are using pre-written code components. Each module is an executable file that can be added to our scripts. Commonly used built-in modules for data professional work are:
- Math provides access to mathematical functions.
import math
print(math.exp(0)) # e**0
print(math.log(1)) # ln(1)
print(math.factorial(4)) # 4!
print(math.sqrt(100)) # square root of 100
1.0
0.0
24
10.0
- Random is used to generate random numbers.
import random
print(random.random()) # 0.0 <= X < 1.0
print(random.choice([1, 2, 3])) # choose a random element from a sequence
print(random.randint(1, 10)) # a <= X <= b
0.2680927283841934
2
8
- Datetime provides helpful date and time conversions and calculations.
import datetime
date = datetime.date(1977, 5, 8) # assign a date to a variable
print(date) # print date
print(date.year) # print the year that the date is in
delta = datetime.timedelta(days=30) # assign a timedelta of 30 days to
# a variable
print(date - delta) # print date of 30 days prior
1977-05-08
1977
1977-04-08
To summarize:
A library is a corpus of reusable code modules and their accompanying documentation. Libraries are bundled into packages that we install, which can then be imported into our coding environment as needed.
Modules are similar to libraries, in that they are groups of related classes and functions, but they are generally subcomponents of libraries. In other words, a library can have many different modules, and we can choose to import the entire library or just the module we need.
Since we mentioned, now let’s talk a little bit about importing too.
Aliasing and Importing
Aliasing lets us assign an alternate name, or alias, by which we can refer to something. Some implementations are below.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
As I mentioned earlier, NumPy is used for high-performance vector and matrix computations. Pandas is a library for manipulating and analyzing tabular data. Seaborn and matplotlib are both libraries used to create graphs, charts, and other data visualizations.
Importing modules
Notice that matplotlib is the library and pyplot is a module inside. The pyplot module is aliased as plt, and it’s accessed from the matplotlib library using the dot.
Importing functions
Just as we can import libraries and modules, we can also import individual functions from libraries or from modules within libraries using a specific syntax. Here’s an example depicting a common import when using the scikit-learn library to build machine learning models:
from sklearn.metrics import precision_score, recall_score