Table:

- Intro
- Conditional probability
- Naive bayes Classification
- Advantage and limitations of this algorithm
- Exercice
- Solution

## Intro:

Naive Bayes is a classification machine learning algorithm that is based on the Bayes theorem. It is very efficient especially when dealing with textual data like: Sentiment Analysis, Spam detection and text classification.

This algorithm is called â€śNaiveâ€ť because of the assumption that all the dataset variables are independent, which is not always the case.

## Conditional probability:

Naive Bayes algorithm is based on the Bayes theorem which is founded upon conditional probability: itâ€™s the probability of the occurrence of an event A given that an event B has already occurred.

**Example:**

Lets have two jars that contain colored balls:

- Jar 1 has 3 blue balls, 2 red balls and 4 green balls.
- Jar 2 has 1 blue ball, 4 red balls and 3 green balls.

We want to calculate the probability of randomly selecting a blue ball from one of the jars

Itâ€™s no other than the sum of the probabilities of selecting a blue ball from Jar1 or Jar2. Now, we want to calculate the probability of selecting a blue ball given that we selected Jar1:

Finally, we want to calculate the probability of selecting Jar1 given that we drew a blue ball. Here we use the Bayes Theorem that is stated as the following:

## Naive Bayes classification

In Naive Bayes classifier, we want to find the class that maximizes the conditional probability given the input vector X; thus, Naive Bayes can be formulated as follow:

With the use of the Bayes Theorem, the function becomes:

In this formulation, itâ€™s easy to calculate P(Ci) which is no other than the probability of the class Ci, and itâ€™s easy to calculate P(x) which is the probability of the event x occurring. Whatâ€™s hard to calculate is P(x|Ci); the probability of the event x given the class Ci. To further simplify this, we need to assume that all the input variables are independent; thus, we can write:

And itâ€™s actually because of this assumption that we call this classifier â€śNaiveâ€ť, because we canâ€™t always guarantee the independence of the input variables. The Naive Bayes classifier becomes:

In fact, we can further simplify this formulation by eliminating P(x), because itâ€™s the same for all classes:

Now, letâ€™s take a look at an example:

| Weather | Time | Day of the week | Dinner | | ------- | -------- | --------------- | ------ | | Clear | Evening | Weekend | Cooks | | Cloudy | Night | Weekday | Orders | | Rainy | Night | Weekday | Orders | | Rainy | Midday | Weekday | Orders | | Cloudy | Midday | Weekend | Cooks | | Clear | Night | Weekend | Cooks | | Snowy | Evening | Weekend | Orders | | Clear | Night | Weekday | Cooks | | Clear | Midnight | Weekend | Orders |

Here, we have a small dataset that contains 3 input variables: Weather, Time and Day of week, and one Target variable: â€śDinnerâ€ť that indicates whether a person cooks or orders their dinner . We would like to find the class of the input x={Clear, Evening, Weekend}:

We need to calculate the conditional probability for the class â€śCooksâ€ť and the class â€śOrdersâ€ť given the input x={Clear, Evening, Weekend}. The predicted class is the one having the highest conditional probability. We start by calculating the conditional probability of the class â€śCooksâ€ť:

Now we calculate each conditional probability on its own: The probability of weather=â€ťClearâ€ť given that the class is â€śCooksâ€ť is the number of the lines with weather â€śClearâ€ť and class â€śCooksâ€ť over the total number of lines with class â€śCooksâ€ť

The same goes for the other conditional probabilities:

Now for the probability P(Cooks) itâ€™s the number of lines with class â€śCooksâ€ť over the total number of lines:

Now we calculate the product of these probabilities:

That was for the class â€śCooksâ€ť, now we need to do the same for the class â€śOrdersâ€ť:

We calculate the individual probabilities:

And we finally calculate the product of the probabilities:

Finally, we take the class with the highest probability which is the class â€śCooksâ€ť:

## Advantages and limitations of this algorithm:

**Advantages:**

- It is a very fast classifier.
- It is easy to implement.
- There is no training phase, but itâ€™s only inference.
- It doesnâ€™t require a lot of data in order to make inferences.

**Limitations:**

- Naive Bayes assumes that the input variables are independent, which is not always true.
- Naive Bayes suffers from the zero-frequency problem: itâ€™s when it assigns zero probability to an input variable. This will zero out all the conditional probability P(C|x). One trick to avoid this is to use a minimal frequency of 1 ( instead of 0 ) to all variables.

## Exercice:

Here is the dataframe of the same dataset that weâ€™ve seen in the example. Your task is to implement Naive Bayes yourself using python:

```
import pandas as pd
dataset = pd.DataFrame()
dataset['Weather'] = ['Clear', 'Cloudy', 'Rainy', 'Rainy', 'Cloudy', 'Clear', 'Snowy', 'Clear', 'Clear']
dataset['Time'] = ['Evening', 'Night', 'Night', 'Midday', 'Midday', 'Night', 'Evening', 'Night', 'Midnight']
dataset['Day'] = ['Weekend', 'Weekday', 'Weekday', 'Weekday', 'Weekend', 'Weekend', 'Weekend', 'Weekday', 'Weekend']
dataset['Class'] = ['Cooks', 'Orders', 'Orders', 'Orders', 'Cooks', 'Cooks', 'Orders', 'Cooks', 'Orders']
def naive_bayes(weather, time, day):
# res_dict = {class1: probability of class 1, class1: probability of class 1
return res_dict
```

## Solution:

```
def naive_bayes(x_weather, x_time, x_day):
TARGET = 'Dinner' # The name of the target variable
CLASSES = list(dataset['Dinner'].unique()) # The classes of the target variable
len_dataset = len(dataset) # The length of the dataset
res_dict = {} # res_dict = {class1:probability1, ..., class_n:probability_n}
# for each class of the target classes, we calculate the it's conditional probability
for class_name in CLASSES:
# the number of lines that belong to the class "class_name"
len_c = len(dataset[ (dataset[TARGET] == class_name) ])
# the number of lines that belong to the class "class_name" and have weather="x_weather"
n_weather = len(dataset[ (dataset[TARGET] == class_name) & (dataset['Weather'] == x_weather) ])
# the number of lines that belong to the class "class_name" and have time="x_time"
n_time = len(dataset[ (dataset[TARGET] == class_name) & (dataset['Time'] == x_time) ])
# the number of lines that belong to the class "class_name" and have day="x_day"
n_day = len(dataset[ (dataset[TARGET] == class_name) & (dataset['Day'] == x_day) ])
# We calculate the conditional probability:
# P(class|x) = P(weather|class) x P(time|class) x P(day|class) x P(class)
p = (n_weather / len_c) * (n_time / len_c) * (n_day / len_c) * (len_c / len_dataset) res_dict[class_name] = p
return res_dict
```