What is the Naive Bayes algorithm used for?

Naive Bayes is mainly used for classification problems. Typical examples include email spam detection, sentiment analysis, document categorisation and simple recommendation systems where inputs can be represented as feature counts or categories.

Why is the algorithm called naive?

It is called naive because it assumes that all features are conditionally independent given the class. In real world data this is rarely strictly true, but the assumption simplifies the maths and still gives good results in many practical situations.

Is Naive Bayes still relevant in 2026?

Yes. Even with modern deep learning models, Naive Bayes remains a strong baseline in 2026, especially for small datasets and text data. It trains quickly, is easy to interpret, and is often used as a first model when exploring a new classification problem.

Naive Bayes Algorithm Explained: Basics, Example & Python

Updated on December 07, 2025 9 minutes read

Naive Bayes is a classic machine learning algorithm used for classification tasks. It is based on Bayes' theorem and models how likely a class is, given some observed features.

In 202,6, it is still widely used in text applications such as sentiment analysis, spam detection, and document categorization, because it is fast, simple, and works well with high-dimensional data.

This article walks through the intuition behind Naive Bayes, shows a small worked example, and finishes with a short Python implementation you can adapt to your own projects.

Naive Bayes

Naive Bayes belongs to the family of probabilistic classifiers. Instead of drawing hard geometric boundaries between classes, it uses probabilities derived from the training data.

The algorithm computes how likely each class is for a given example and then chooses the class with the highest probability. This makes the model very interpretable and easier to debug for beginners.

The method is called naive because it assumes that all input features are conditionally independent once you know the class. In real data, that is rarely strictly true, but the assumption often works well enough in practice.

Conditional probability

Before we can understand Naive Bayes, we need to be comfortable with conditional probability and the Theorem The conditional probability of an event A given that event B happened is written as P(A | B). It tells us how likely A is, under the condition that we already know B occurred.

Formally, conditional probability is defined as:

P(A | B) = P(A ∩ B) / P(B)

where P(A ∩ B) is the probability that A and B happen together, and P(B) is the probability of B.

Jar example

Imagine two jars filled with colored balls. We first choose a jar at random, then pick a ball from that jar.

Jar 1 has 3 blue balls, 2 red balls, and 4 green balls.
Jar 2 has 1 blue ball, 4 red balls, and 3 green balls.

We can now ask several probability questions.

What is the probability of picking a blue ball?
What is the probability of a blue ball given that we picked Jar 1?
What is the probability that we picked Jar 1 given that the ball is blue?

Questions 1 and 2 can be answered directly with conditional probability rules. For question 3 we use Bayes theorem.

Bayes theorem

Bayes theorem relates P(A | B) and P(B | A):

P(A | B) = P(B | A) * P(A) / P(B)

In the jar example, let A be the event that we chose Jar 1 and B be the event that we picked a blue ball. Then Bayes theorem becomes:

P(Jar 1 | blue) = P(blue | Jar 1) * P(Jar 1) / P(blue)

This is the key idea behind Naive Bayes. We want P(class | features), but that is hard to compute directly. Bayes theorem lets us rewrite it using probabilities that are easier to estimate from data.

Naive Bayes classification

In classification we are given an input vector x = (x1, x2, ..., xn) and a set of possible classes C1, C2, ..., Ck.

Naive Bayes predicts the class that maximizes the conditional probability of the class given the input:

ŷ = argmax_i P(Ci | x)

Using Bayes theorem we can rewrite this as:

P(Ci | x) = P(x | Ci) * P(Ci) / P(x)

Here:

P(Ci) is the prior probability of class Ci in the training data.
P(x | Ci) is the likelihood, which measures how likely we are to see example x if the true class is Ci.
P(x) is the overall probability of seeing x.

The denominator P(x) is the same for all classes, so we can drop it when we compare classes:

ŷ = argmax_i P(x | Ci) * P(Ci)

The naive independence assumption

The hard part is P(x | Ci). For many features it is difficult to estimate this probability directly.

Naive Bayes simplifies the problem by assuming that the features x1, x2, ..., xn are conditionally independent given the class. Mathematically this means:

P(x | Ci) = Π_j P(xj | Ci)

Substituting this into the formula for the prediction gives:

ŷ = argmax_i P(Ci) * Π_j P(xj | Ci)

This assumption is rarely perfectly true, but it makes the model extremely fast and easy to train. That is why it remains popular in many practical workflows.

In real code, implementations often work in log space to avoid numerical underflow:

log P(Ci | x) ∝ log P(Ci) + Σ_j log P(xj | Ci)

Worked example: dinner decisions

Let us revisit the small dataset used in the original example. Each row describes the weather, time of day, and whether it is a weekday or weekend, along with the target column Dinner, which records whether a person cooks or orders dinner.

Weather	Time	Day of the week	Dinner
Clear	Evening	Weekend	Cooks
Cloudy	Night	Weekday	Orders
Rainy	Night	Weekday	Orders
Rainy	Midday	Weekday	Orders
Cloudy	Midday	Weekend	Cooks
Clear	Night	Weekend	Cooks
Snowy	Evening	Weekend	Orders
Clear	Night	Weekday	Cooks
Clear	Midnight	Weekend	Orders

We want to predict whether the person will cook or order for the input:

x = {Weather = Clear, Time = Evening, Day = Weekend}

To do this with Naive Bayes we compute:

P(Dinner = Cooks | x)
P(Dinner = Orders | x)

and take the class with the higher value.

Step 1: class probabilities

In the dataset above there are 9 rows in total.

Dinner = Cooks appears 4 times, so P(Cooks) = 4/9.
Dinner = Orders appears 5 times, so P(Orders) = 5/9.

Step 2: conditional probabilities for class Cooks

Now we compute the conditional probabilities of each feature given that the dinner decision is Cooks.

Among the 4 rows where Dinner = Cooks:

3 have Weather = Clear, so P(Weather = Clear | Cooks) = 3/4.
1 has Time = Evening, so P(Time = Evening | Cooks) = 1/4.
3 have Day = Weekend, so P(Day = Weekend | Cooks) = 3/4.

Multiplying these with the class probability gives an unnormalized score for the class Cooks:

score(Cooks) = (3/4) * (1/4) * (3/4) * (4/9) = 1/16

Step 3: conditional probabilities for class Orders

Among the 5 rows where Dinner = Orders:

1 has Weather = Clear, so P(Weather = Clear | Orders) = 1/5.
1 has Time = Evening, so P(Time = Evening | Orders) = 1/5.
2 have Day = Weekend, so P(Day = Weekend | Orders) = 2/5.

The unnormalized score for Orders is:

score(Orders) = (1/5) * (1/5) * (2/5) * (5/9) = 2/225

Because score(Cooks) > score(Orders), Naive Bayes predicts that the person will cook dinner for this combination of features.

Advantages and limitations

Naive Bayes has several practical advantages.

It is a very fast classifier, both to train and to make predictions with.
It is simple to implement and interpret, which makes it a good teaching model.
It works well on high-dimensional data such as text, where each word or token becomes a feature.
It can achieve solid baseline performance even with relatively small training datasets.

However, there are also important limitations.

The independence assumption between features is often violated in real data, which can reduce accuracy.
Naive Bayes is less suitable when strong interactions between features drive the outcome.
For some tasks more flexible models such as tree-based methods or neural networks can significantly outperform it.

Zero frequency and Laplace smoothing

A common issue is the zero frequency problem. If a feature value never appears in the training data for a given class, the corresponding probability is zero. When we multiply probabilities together this sets the whole product to zero.

To avoid this, Naive Bayes implementations usually add a small constant to all counts, a technique known as Laplace or add-one smoothing. Conceptually, we behave as if every possible feature value has been observed at least once in each class.

Most standard machine learning libraries handle smoothing automatically, but it is useful to understand the idea when implementing Naive Bayes by hand.

Exercise: implement Naive Bayes in Python

Below is the same dataset in a small pandas DataFrame. Your task is to implement a simple Naive Bayes classifier for the dinner decision problem.

import pandas as pd

dataset = pd.DataFrame()
dataset['Weather'] = ['Clear', 'Cloudy', 'Rainy', 'Rainy', 'Cloudy',
                      'Clear', 'Snowy', 'Clear', 'Clear']
dataset['Time'] = ['Evening', 'Night', 'Night', 'Midday', 'Midday',
                   'Night', 'Evening', 'Night', 'Midnight']
dataset['Day'] = ['Weekend', 'Weekday', 'Weekday', 'Weekday', 'Weekend',
                  'Weekend', 'Weekend', 'Weekday', 'Weekend']
dataset['Dinner'] = ['Cooks', 'Orders', 'Orders', 'Orders', 'Cooks',
                     'Cooks', 'Orders', 'Cooks', 'Orders']

def naive_bayes(weather, time, day):
    """
    Return a dictionary:
    {
        'Cooks': probability_for_Cooks,
        'Orders': probability_for_Orders
    }
    computed using the Naive Bayes formula.
    """
    res_dict = {}
    return res_dict

Hints

Compute the prior probability of each class by counting how many rows belong to that class.
For each class, compute the conditional probability of the given weather, time, and day values.
Multiply these probabilities together as in the worked example, then store the result in res_dict.
Optionally, add Laplace smoothing so that no probability is exactly zero.

Solution

Here is one straightforward implementation that mirrors the manual calculations above.

def naive_bayes(x_weather, x_time, x_day):
    target_col = 'Dinner'
    classes = list(dataset[target_col].unique())
    n_rows = len(dataset)
    res_dict = {}

    For class_name in classes:
        # subset of rows that belong to this class
        subset = dataset[dataset[target_col] == class_name]
        n_c = len(subset)

        # class prior P(class)
        p_class = n_c / n_rows

        # conditional probabilities for each feature given the class
        p_weather = len(subset[subset['Weather'] == x_weather]) / n_c
        p_time = len(subset[subset['Time'] == x_time]) / n_c
        p_day = len(subset[subset['Day'] == x_day]) / n_c

        # total score for this class (proportional to P(class | x))
        p = p_class * p_weather * p_time * p_day

        res_dict[class_name] = p

    return res_dict

You can test the function with the example from the worked section:

result = naive_bayes('Clear', 'Evening', 'Weekend')
print(result)
predicted_class = max(result, key=result.get)
print(predicted_class)

You should see that predicted_class is Cooks, which matches the manual calculation from the worked example.

In a real project, you would typically rely on a library implementation to focus on feature engineering and evaluation. For example, Python users often turn to the scikit-learn naive_bayes module when building production systems.

Next steps with Code Labs Academy

Naive Bayes is often one of the first algorithms taught in an applied data science curriculum because it combines clear mathematics with practical use cases.

If you want to go deeper into machine learning in 2026, from probability foundations to deploying models, consider joining the Code Labs Academy Data Science and AI bootcamp. You will practice algorithms like Naive Bayes in hands-on projects and build a portfolio that prepares you for real roles.

Master data science and AI with Code Labs Academy by joining the Online bootcamp with flexible part-time and full-time options.