Table:
Intro
Naive Bayes is a classification machine learning algorithm that is based on the Bayes theorem. It is very efficient especially when dealing with textual data like: Sentiment Analysis, Spam detection and text classification.
This algorithm is called “Naive” because of the assumption that all the dataset variables are independent, which is not always the case.
Before going any further into explaining how Naive bayes works, let’s make sure that we understand the following:
Conditional probability
Naive Bayes algorithm is based on the Bayes theorem which is founded upon conditional probability: it’s the probability of the occurrence of an event A given that an event B has already occurred.
Example:
Lets have two jars that contain colored balls:
-
Jar 1 has 3 blue balls, 2 red balls and 4 green balls.
-
Jar 2 has 1 blue ball, 4 red balls and 3 green balls.
We want to calculate the probability of randomly selecting a blue ball from one of the jars
It’s no other than the sum of the probabilities of selecting a blue ball from Jar1 or Jar2.
Now, we want to calculate the probability of selecting a blue ball given that we selected Jar1:
Finally, we want to calculate the probability of selecting Jar1 given that we drew a blue ball. Here we use the Bayes Theorem that is stated as the following:
Naive Bayes classification
In Naive Bayes classifier, we want to find the class that maximizes the conditional probability given the input vector X; thus, Naive Bayes can be formulated as follow:
With the use of the Bayes Theorem, the function becomes:
In this formulation, it’s easy to calculate P(Ci) which is no other than the probability of the class Ci, and it’s easy to calculate P(x) which is the probability of the event x occurring.
What’s hard to calculate is P(x|Ci); the probability of the event x given the class Ci. To further simplify this, we need to assume that all the input variables are independent; thus, we can write:
And it’s actually because of this assumption that we call this classifier “Naive”, because we can’t always guarantee the independence of the input variables. The Naive Bayes classifier becomes:
In fact, we can further simplify this formulation by eliminating P(x), because it’s the same for all classes:
Now, let’s take a look at an example:
| Weather | Time | Day of the week | Dinner |
| ------- | -------- | --------------- | ------ |
| Clear | Evening | Weekend | Cooks |
| Cloudy | Night | Weekday | Orders |
| Rainy | Night | Weekday | Orders |
| Rainy | Midday | Weekday | Orders |
| Cloudy | Midday | Weekend | Cooks |
| Clear | Night | Weekend | Cooks |
| Snowy | Evening | Weekend | Orders |
| Clear | Night | Weekday | Cooks |
| Clear | Midnight | Weekend | Orders |
Here, we have a small dataset that contains 3 input variables: Weather, Time and Day of week, and one Target variable: “Dinner” that indicates whether a person cooks or orders their dinner . We would like to find the class of the input x={Clear, Evening, Weekend}:
We need to calculate the conditional probability for the class “Cooks” and the class “Orders” given the input x={Clear, Evening, Weekend}. The predicted class is the one having the highest conditional probability.
We start by calculating the conditional probability of the class “Cooks”:
Now we calculate each conditional probability on its own:
The probability of weather=”Clear” given that the class is “Cooks” is the number of the lines with weather “Clear” and class “Cooks” over the total number of lines with class “Cooks”
The same goes for the other conditional probabilities:
Now for the probability P(Cooks) it’s the number of lines with class “Cooks” over the total number of lines:
Now we calculate the product of these probabilities:
That was for the class “Cooks”, now we need to do the same for the class “Orders”:
We calculate the individual probabilities:
And we finally calculate the product of the probabilities:
Finally, we take the class with the highest probability which is the class “Cooks”:
Advantages and limitations of this algorithm
Advantages:
-
It is a very fast classifier.
-
It is easy to implement.
-
There is no training phase, but it’s only inference.
-
It doesn’t require a lot of data in order to make inferences.
Limitations:
- Naive Bayes assumes that the input variables are independent, which is not always true.
- Naive Bayes suffers from the zero-frequency problem: it’s when it assigns zero probability to an input variable. This will zero out all the conditional probability P(C|x). One trick to avoid this is to use a minimal frequency of 1 ( instead of 0 ) to all variables.
Exercice
Here is the dataframe of the same dataset that we’ve seen in the example.
Your task is to implement Naive Bayes yourself using python:
import pandas as pd
dataset = pd.DataFrame()
dataset['Weather'] = ['Clear', 'Cloudy', 'Rainy', 'Rainy', 'Cloudy', 'Clear', 'Snowy', 'Clear', 'Clear']
dataset['Time'] = ['Evening', 'Night', 'Night', 'Midday', 'Midday', 'Night', 'Evening', 'Night', 'Midnight']
dataset['Day'] = ['Weekend', 'Weekday', 'Weekday', 'Weekday', 'Weekend', 'Weekend', 'Weekend', 'Weekday', 'Weekend']
dataset['Class'] = ['Cooks', 'Orders', 'Orders', 'Orders', 'Cooks', 'Cooks', 'Orders', 'Cooks', 'Orders']
def naive_bayes(weather, time, day):
# res_dict = {class1: probability of class 1, class1: probability of class 1
return res_dict
Solution
def naive_bayes(x_weather, x_time, x_day):
TARGET = 'Dinner' # The name of the target variable
CLASSES = list(dataset['Dinner'].unique()) # The classes of the target variable
len_dataset = len(dataset) # The length of the dataset
res_dict = {} # res_dict = {class1:probability1, ..., class_n:probability_n}
# for each class of the target classes, we calculate the it's conditional probability
for class_name in CLASSES:
# the number of lines that belong to the class "class_name"
len_c = len(dataset[ (dataset[TARGET] == class_name) ])
# the number of lines that belong to the class "class_name" and have weather="x_weather"
n_weather = len(dataset[ (dataset[TARGET] == class_name) & (dataset['Weather'] == x_weather) ])
# the number of lines that belong to the class "class_name" and have time="x_time"
n_time = len(dataset[ (dataset[TARGET] == class_name) & (dataset['Time'] == x_time) ])
# the number of lines that belong to the class "class_name" and have day="x_day"
n_day = len(dataset[ (dataset[TARGET] == class_name) & (dataset['Day'] == x_day) ])
# We calculate the conditional probability:
# P(class|x) = P(weather|class) x P(time|class) x P(day|class) x P(class)
p = (n_weather / len_c) * (n_time / len_c) * (n_day / len_c) * (len_c / len_dataset) res_dict[class_name] = p
return res_dict