Naive Bayes is a classification machine learning algorithm that is based on the Bayes theorem. It is very efficient especially when dealing with textual data like: Sentiment Analysis, Spam detection and text classification.

This algorithm is called “Naive” because of the assumption that all the dataset variables are independent, which is not always the case. Before going any further into explaining how Naive bayes works, let’s make sure that we understand the following:

Conditional probability:

Naive Bayes algorithm is based on the Bayes theorem which is founded upon conditional probability: it’s the probability of the occurrence of an event A given that an event B has already occurred.


Lets have two jars that contain colored balls:

  • Jar 1 has 3 blue balls, 2 red balls and 4 green balls.
  • Jar 2 has 1 blue ball, 4 red balls and 3 green balls.

We want to calculate the probability of randomly selecting a blue ball from one of the jars equations

It’s no other than the sum of the probabilities of selecting a blue ball from Jar1 or Jar2. Now, we want to calculate the probability of selecting a blue ball given that we selected Jar1: equations

Finally, we want to calculate the probability of selecting Jar1 given that we drew a blue ball. Here we use the Bayes Theorem that is stated as the following: equations


Naive Bayes classification

In Naive Bayes classifier, we want to find the class that maximizes the conditional probability given the input vector X; thus, Naive Bayes can be formulated as follow: equations

With the use of the Bayes Theorem, the function becomes: equations

In this formulation, it’s easy to calculate P(Ci) which is no other than the probability of the class Ci, and it’s easy to calculate P(x) which is the probability of the event x occurring. What’s hard to calculate is P(x|Ci); the probability of the event x given the class Ci. To further simplify this, we need to assume that all the input variables are independent; thus, we can write: equations

And it’s actually because of this assumption that we call this classifier “Naive”, because we can’t always guarantee the independence of the input variables. The Naive Bayes classifier becomes: equations

In fact, we can further simplify this formulation by eliminating P(x), because it’s the same for all classes: equations

Now, let’s take a look at an example:

| Weather | Time | Day of the week | Dinner | | ------- | -------- | --------------- | ------ | | Clear | Evening | Weekend | Cooks | | Cloudy | Night | Weekday | Orders | | Rainy | Night | Weekday | Orders | | Rainy | Midday | Weekday | Orders | | Cloudy | Midday | Weekend | Cooks | | Clear | Night | Weekend | Cooks | | Snowy | Evening | Weekend | Orders | | Clear | Night | Weekday | Cooks | | Clear | Midnight | Weekend | Orders |

Here, we have a small dataset that contains 3 input variables: Weather, Time and Day of week, and one Target variable: “Dinner” that indicates whether a person cooks or orders their dinner . We would like to find the class of the input x={Clear, Evening, Weekend}: equations

We need to calculate the conditional probability for the class “Cooks” and the class “Orders” given the input x={Clear, Evening, Weekend}. The predicted class is the one having the highest conditional probability. We start by calculating the conditional probability of the class “Cooks”: equations

Now we calculate each conditional probability on its own: The probability of weather=”Clear” given that the class is “Cooks” is the number of the lines with weather “Clear” and class “Cooks” over the total number of lines with class “Cooks” equations

The same goes for the other conditional probabilities: equations

Now for the probability P(Cooks) it’s the number of lines with class “Cooks” over the total number of lines: equations

Now we calculate the product of these probabilities: equations

That was for the class “Cooks”, now we need to do the same for the class “Orders”: equations

We calculate the individual probabilities: equations

And we finally calculate the product of the probabilities: equations

Finally, we take the class with the highest probability which is the class “Cooks”: equations

equations equations equations equations


Advantages and limitations of this algorithm:


  • It is a very fast classifier.
  • It is easy to implement.
  • There is no training phase, but it’s only inference.
  • It doesn’t require a lot of data in order to make inferences.


  • Naive Bayes assumes that the input variables are independent, which is not always true.
  • Naive Bayes suffers from the zero-frequency problem: it’s when it assigns zero probability to an input variable. This will zero out all the conditional probability P(C|x). One trick to avoid this is to use a minimal frequency of 1 ( instead of 0 ) to all variables.


Here is the dataframe of the same dataset that we’ve seen in the example. Your task is to implement Naive Bayes yourself using python:

import pandas as pd
dataset = pd.DataFrame()
dataset['Weather'] = ['Clear', 'Cloudy', 'Rainy', 'Rainy', 'Cloudy', 'Clear', 'Snowy', 'Clear', 'Clear']
dataset['Time'] = ['Evening', 'Night', 'Night', 'Midday', 'Midday', 'Night', 'Evening', 'Night', 'Midnight']
dataset['Day'] = ['Weekend', 'Weekday', 'Weekday', 'Weekday', 'Weekend', 'Weekend', 'Weekend', 'Weekday', 'Weekend']
dataset['Class'] = ['Cooks', 'Orders', 'Orders', 'Orders', 'Cooks', 'Cooks', 'Orders', 'Cooks', 'Orders']

def naive_bayes(weather, time, day):

# res_dict = {class1: probability of class 1, class1: probability of class 1
return res_dict


def naive_bayes(x_weather, x_time, x_day):
    TARGET = 'Dinner' # The name of the target variable
    CLASSES = list(dataset['Dinner'].unique()) # The classes of the target variable
len_dataset = len(dataset) # The length of the dataset
res_dict = {} # res_dict = {class1:probability1, ..., class_n:probability_n}

# for each class of the target classes, we calculate the it's conditional probability
for class_name in CLASSES:
# the number of lines that belong to the class "class_name"
len_c   = len(dataset[ (dataset[TARGET] == class_name) ])

# the number of lines that belong to the class "class_name" and have weather="x_weather"
n_weather = len(dataset[ (dataset[TARGET] == class_name) & (dataset['Weather'] == x_weather) ])
# the number of lines that belong to the class "class_name" and have time="x_time"
n_time  = len(dataset[ (dataset[TARGET] == class_name) & (dataset['Time'] == x_time) ])

# the number of lines that belong to the class "class_name" and have day="x_day"
n_day   = len(dataset[ (dataset[TARGET] == class_name) & (dataset['Day'] == x_day) ])

# We calculate the conditional probability:
# P(class|x) = P(weather|class) x P(time|class) x P(day|class) x P(class)
p = (n_weather / len_c) * (n_time / len_c) * (n_day / len_c) * (len_c / len_dataset)        res_dict[class_name] = p
return res_dict

Our website uses cookies and similar technologies to personalize your experience, offer sign-on options, and to analyze our traffic. See our Privacy Policy for more info.