Introduction to Naive Bayes
In the construct stage we bring the model to life. We’ll do this by building a model called Naive Bayes. The theoretical foundations of the model date back nearly 300 years, and though the field of data science and machine learning has grown immensely in recent years, Naive Bayes models remain relevant because they are simple, fast, and good predictors. In certain situations, Naive Bayes is also known to outperform much more advanced classification methods. Even if a more advanced model is required, producing a Naive Bayes model can also be a great starting point.
Naive Bayes is a supervised classification technique that is based on Bayes’ theorem with an assumption of independence among predictors. The effect of the value of a predictor variable on a given class is not affected by the values of other predictors.
Bayes’ theorem gives us a method of calculating the posterior probability, which is the likelihood of an event occurring after taking into consideration new information. In other words, when we calculate the probability of something happening, we take relevant observations into account. It can be represented with the following equation, which calculates the posterior probability of c given x.
The posterior probability equation can be rewritten to reveal what’s going on behind the variables.
Sample: Weather dataset
This is complex, so let’s get a dataset and apply Naive Bayes to gain a better understanding. The weather data set below will help us build a model to decide whether to go outside and play soccer.
We can start with the Outlook variable and calculate the posterior probability of one of the features in the data set. To do this, we construct a frequency table for each attribute against the target by tallying the number of times soccer is and isn’t played for a given attribute. Then we transform the frequency tables into likelihood tables by calculating the number of times soccer is and isn’t played for each attribute.
The process of finding the posterior probability needs to be done for every possible class that is potentially being predicted. In this case, there are only two outcomes; play or don’t play. Once these values are found, the prediction is made based on the class with the highest posterior probability.
Observed that the posterior probability of playing while it is sunny is higher than the posterior probability of not playing. If it’s sunny outside, a Naive Bayes model would predict that the conditions are right to play soccer.
Pros and Cons of Naive Bayes
- Naive Bayes is one of the simplest classification algorithms. In spite of their assumptions, Naive Bayes classifiers work quite well in many industry problems, most famously for document analysis/classification and spam filtering.
- Additionally, the training time for a Naive Bayes model can sometimes be drastically lower than for other models because the calculations that are needed to make it work are relatively cheap in terms of computer resource consumption. This also means it is highly scalable and able to work with large increases in the amount of data it must handle.
- One of the biggest problems with Naive Bayes is the data assumptions that were mentioned earlier. Few datasets have truly conditionally independent features, it is something that is very rare in the world today. However, Naive Bayes models can still perform well even if the assumption of conditional independence is violated.
- Another issue that could arise is what is known as the “zero frequency” problem. This occurs when the dataset we’re using has no occurrences of a class label and some value of a predictor variable together. This would mean that there is a probability of zero. Since the final posterior probability is found by multiplying all of the individual probabilities together, the probability of zero would automatically make the result zero. Library implementations of the algorithms account for this by adding a negligible value to each variable count (usually 1) to ensure a non-zero probability.
Implementations in scikit-learn
BernoulliNB: Used for binary/Boolean features
CategoricalNB: Used for categorical features
ComplementNB: Used for imbalanced datasets, often for text classification tasks
GaussianNB: Used for continuous features, normally distributed features
MultinomialNB: Used for multinomial (discrete) features
Disclaimer: Like most of my posts, this content is intended solely for educational purposes and was created primarily for my personal reference. At times, I may rephrase original texts, and in some cases, I include materials such as graphs, equations, and datasets directly from their original sources.
I typically reference a variety of sources and update my posts whenever new or related information becomes available. For this particular post, the primary source was Google Advanced Data Analytics Professional Certificate program.