Naive Bayes Algorithm Explained: Basics, Example & Python
Updated on December 07, 2025 9 minutes read
Naive Bayes is a classic machine learning algorithm used for classification tasks. It is based on Bayes' theorem and models how likely a class is, given some observed features.
In 202,6, it is still widely used in text applications such as sentiment analysis, spam detection, and document categorization, because it is fast, simple, and works well with high-dimensional data.
This article walks through the intuition behind Naive Bayes, shows a small worked example, and finishes with a short Python implementation you can adapt to your own projects.
Naive Bayes
Naive Bayes belongs to the family of probabilistic classifiers. Instead of drawing hard geometric boundaries between classes, it uses probabilities derived from the training data.
The algorithm computes how likely each class is for a given example and then chooses the class with the highest probability. This makes the model very interpretable and easier to debug for beginners.
The method is called naive because it assumes that all input features are conditionally independent once you know the class. In real data, that is rarely strictly true, but the assumption often works well enough in practice.
Conditional probability
Before we can understand Naive Bayes, we need to be comfortable with conditional probability and the Theorem
The conditional probability of an event A given that event B happened is written as P(A | B). It tells us how likely A is, under the condition that we already know B occurred.
Formally, conditional probability is defined as:
P(A | B) = P(A ∩ B) / P(B)
where P(A ∩ B) is the probability that A and B happen together, and P(B) is the probability of B.
Jar example
Imagine two jars filled with colored balls. We first choose a jar at random, then pick a ball from that jar.
- Jar 1 has 3 blue balls, 2 red balls, and 4 green balls.
- Jar 2 has 1 blue ball, 4 red balls, and 3 green balls.
We can now ask several probability questions.
- What is the probability of picking a blue ball?
- What is the probability of a blue ball given that we picked Jar 1?
- What is the probability that we picked Jar 1 given that the ball is blue?
Questions 1 and 2 can be answered directly with conditional probability rules. For question 3 we use Bayes theorem.
Bayes theorem
Bayes theorem relates P(A | B) and P(B | A):
P(A | B) = P(B | A) * P(A) / P(B)
In the jar example, let A be the event that we chose Jar 1 and B be the event that we picked a blue ball. Then Bayes theorem becomes:
P(Jar 1 | blue) = P(blue | Jar 1) * P(Jar 1) / P(blue)
This is the key idea behind Naive Bayes. We want P(class | features), but that is hard to compute directly. Bayes theorem lets us rewrite it using probabilities that are easier to estimate from data.
Naive Bayes classification
In classification we are given an input vector x = (x1, x2, ..., xn) and a set of possible classes C1, C2, ..., Ck.
Naive Bayes predicts the class that maximizes the conditional probability of the class given the input:
ŷ = argmax_i P(Ci | x)
Using Bayes theorem we can rewrite this as:
P(Ci | x) = P(x | Ci) * P(Ci) / P(x)
Here:
P(Ci)is the prior probability of classCiin the training data.P(x | Ci)is the likelihood, which measures how likely we are to see examplexif the true class isCi.P(x)is the overall probability of seeingx.
The denominator P(x) is the same for all classes, so we can drop it when we compare classes:
ŷ = argmax_i P(x | Ci) * P(Ci)
The naive independence assumption
The hard part is P(x | Ci). For many features it is difficult to estimate this probability directly.
Naive Bayes simplifies the problem by assuming that the features x1, x2, ..., xn are conditionally independent given the class. Mathematically this means:
P(x | Ci) = Π_j P(xj | Ci)
Substituting this into the formula for the prediction gives:
ŷ = argmax_i P(Ci) * Π_j P(xj | Ci)
This assumption is rarely perfectly true, but it makes the model extremely fast and easy to train. That is why it remains popular in many practical workflows.
In real code, implementations often work in log space to avoid numerical underflow:
log P(Ci | x) ∝ log P(Ci) + Σ_j log P(xj | Ci)
Worked example: dinner decisions
Let us revisit the small dataset used in the original example. Each row describes the weather, time of day, and whether it is a weekday or weekend, along with the target column Dinner, which records whether a person cooks or orders dinner.
| Weather | Time | Day of the week | Dinner |
|---|---|---|---|
| Clear | Evening | Weekend | Cooks |
| Cloudy | Night | Weekday | Orders |
| Rainy | Night | Weekday | Orders |
| Rainy | Midday | Weekday | Orders |
| Cloudy | Midday | Weekend | Cooks |
| Clear | Night | Weekend | Cooks |
| Snowy | Evening | Weekend | Orders |
| Clear | Night | Weekday | Cooks |
| Clear | Midnight | Weekend | Orders |
We want to predict whether the person will cook or order for the input:
x = {Weather = Clear, Time = Evening, Day = Weekend}
To do this with Naive Bayes we compute:
P(Dinner = Cooks | x)P(Dinner = Orders | x)
and take the class with the higher value.
Step 1: class probabilities
In the dataset above there are 9 rows in total.
Dinner = Cooksappears 4 times, soP(Cooks) = 4/9.Dinner = Ordersappears 5 times, soP(Orders) = 5/9.
Step 2: conditional probabilities for class Cooks
Now we compute the conditional probabilities of each feature given that the dinner decision is Cooks.
Among the 4 rows where Dinner = Cooks:
- 3 have
Weather = Clear, soP(Weather = Clear | Cooks) = 3/4. - 1 has
Time = Evening, soP(Time = Evening | Cooks) = 1/4. - 3 have
Day = Weekend, soP(Day = Weekend | Cooks) = 3/4.
Multiplying these with the class probability gives an unnormalized score for the class Cooks:
score(Cooks) = (3/4) * (1/4) * (3/4) * (4/9) = 1/16
Step 3: conditional probabilities for class Orders
Among the 5 rows where Dinner = Orders:
- 1 has
Weather = Clear, soP(Weather = Clear | Orders) = 1/5. - 1 has
Time = Evening, soP(Time = Evening | Orders) = 1/5. - 2 have
Day = Weekend, soP(Day = Weekend | Orders) = 2/5.
The unnormalized score for Orders is:
score(Orders) = (1/5) * (1/5) * (2/5) * (5/9) = 2/225
Because score(Cooks) > score(Orders), Naive Bayes predicts that the person will cook dinner for this combination of features.
Advantages and limitations
Naive Bayes has several practical advantages.
- It is a very fast classifier, both to train and to make predictions with.
- It is simple to implement and interpret, which makes it a good teaching model.
- It works well on high-dimensional data such as text, where each word or token becomes a feature.
- It can achieve solid baseline performance even with relatively small training datasets.
However, there are also important limitations.
- The independence assumption between features is often violated in real data, which can reduce accuracy.
- Naive Bayes is less suitable when strong interactions between features drive the outcome.
- For some tasks more flexible models such as tree-based methods or neural networks can significantly outperform it.
Zero frequency and Laplace smoothing
A common issue is the zero frequency problem. If a feature value never appears in the training data for a given class, the corresponding probability is zero. When we multiply probabilities together this sets the whole product to zero.
To avoid this, Naive Bayes implementations usually add a small constant to all counts, a technique known as Laplace or add-one smoothing. Conceptually, we behave as if every possible feature value has been observed at least once in each class.
Most standard machine learning libraries handle smoothing automatically, but it is useful to understand the idea when implementing Naive Bayes by hand.
Exercise: implement Naive Bayes in Python
Below is the same dataset in a small pandas DataFrame. Your task is to implement a simple Naive Bayes classifier for the dinner decision problem.
import pandas as pd
dataset = pd.DataFrame()
dataset['Weather'] = ['Clear', 'Cloudy', 'Rainy', 'Rainy', 'Cloudy',
'Clear', 'Snowy', 'Clear', 'Clear']
dataset['Time'] = ['Evening', 'Night', 'Night', 'Midday', 'Midday',
'Night', 'Evening', 'Night', 'Midnight']
dataset['Day'] = ['Weekend', 'Weekday', 'Weekday', 'Weekday', 'Weekend',
'Weekend', 'Weekend', 'Weekday', 'Weekend']
dataset['Dinner'] = ['Cooks', 'Orders', 'Orders', 'Orders', 'Cooks',
'Cooks', 'Orders', 'Cooks', 'Orders']
def naive_bayes(weather, time, day):
"""
Return a dictionary:
{
'Cooks': probability_for_Cooks,
'Orders': probability_for_Orders
}
computed using the Naive Bayes formula.
"""
res_dict = {}
return res_dict
Hints
- Compute the prior probability of each class by counting how many rows belong to that class.
- For each class, compute the conditional probability of the given weather, time, and day values.
- Multiply these probabilities together as in the worked example, then store the result in
res_dict. - Optionally, add Laplace smoothing so that no probability is exactly zero.
Solution
Here is one straightforward implementation that mirrors the manual calculations above.
def naive_bayes(x_weather, x_time, x_day):
target_col = 'Dinner'
classes = list(dataset[target_col].unique())
n_rows = len(dataset)
res_dict = {}
For class_name in classes:
# subset of rows that belong to this class
subset = dataset[dataset[target_col] == class_name]
n_c = len(subset)
# class prior P(class)
p_class = n_c / n_rows
# conditional probabilities for each feature given the class
p_weather = len(subset[subset['Weather'] == x_weather]) / n_c
p_time = len(subset[subset['Time'] == x_time]) / n_c
p_day = len(subset[subset['Day'] == x_day]) / n_c
# total score for this class (proportional to P(class | x))
p = p_class * p_weather * p_time * p_day
res_dict[class_name] = p
return res_dict
You can test the function with the example from the worked section:
result = naive_bayes('Clear', 'Evening', 'Weekend')
print(result)
predicted_class = max(result, key=result.get)
print(predicted_class)
You should see that predicted_class is Cooks, which matches the manual calculation from the worked example.
In a real project, you would typically rely on a library implementation to focus on feature engineering and evaluation. For example, Python users often turn to the scikit-learn naive_bayes module when building production systems.
Next steps with Code Labs Academy
Naive Bayes is often one of the first algorithms taught in an applied data science curriculum because it combines clear mathematics with practical use cases.
If you want to go deeper into machine learning in 2026, from probability foundations to deploying models, consider joining the Code Labs Academy Data Science and AI bootcamp. You will practice algorithms like Naive Bayes in hands-on projects and build a portfolio that prepares you for real roles.
Master data science and AI with Code Labs Academy by joining the Online bootcamp with flexible part-time and full-time options.