What is linear regression used for?

Linear regression is used to predict a numeric value and to understand how one or more features relate to a target. It’s often a first, interpretable baseline before trying more complex models.

What’s the difference between linear regression and classification?

Linear regression predicts continuous numbers (like price or time). Classification predicts categories (like “spam” vs. “not spam”), even though both can be trained on similar kinds of data.

When should I avoid linear regression?

Be cautious when the relationship is clearly non-linear, when outliers dominate the trend, or when you need reliable predictions far outside the training range. In those cases, feature engineering or different model families may work better.

Linear Regression Explained (2026): Basics, Examples & Metrics

Updated on January 10, 2026 4 minutes read

Linear regression is a simple way to model the relationship between variables. In 2026, it is still widely used as a baseline because it is quick to train and easy to interpret.

You can use it to predict a numeric outcome (like revenue, time, or price) and to estimate how strongly different inputs relate to that outcome.

What linear regression models

Linear regression connects an input (or several inputs) to an output by fitting a straight line (or, with multiple inputs, a plane or hyperplane). The goal is to capture the trend in your data and make reasonable predictions.

Two terms show up often and mean the same thing across most courses and tools:

Target / dependent variable (y): the value you want to predict.
Feature(s) / independent variable(s) (x): the value(s) used to predict it.

The core equation

For simple linear regression (one feature), the model is often written as:

y = m*x + b

m (slope): how much y changes when x increases by 1.
b (intercept): the predicted value of y when x = 0.

In real datasets, points do not land perfectly on the line. The difference between the observed value and the model prediction is called a residual (also called an error).

Simple vs. multiple linear regression

Simple linear regression uses one feature. It is useful when you expect a single main driver and want a clear baseline.

Multiple linear regression uses several features at once, for example: Marketing spend, seasonality, and product price, predicting weekly sales.

How the best-fit line is chosen

The most common approach is ordinary least squares. It chooses parameters that minimize the total squared residuals across the data.

Squared error is practical: it is easy to compute and penalizes big mistakes more than small ones, which can be helpful in many business settings.

A practical workflow in 2026

1) Start with a clear question

Define what you are predicting and why it matters. Also, decide what level of error is acceptable for the use case you have.

2) Explore and prepare your data

Check for missing values, obvious outliers, and inconsistent units. A quick scatter plot of x vs. y can reveal whether a straight-line model is a reasonable starting point.

3) Fit the model

Train on historical data and test on data the model has not seen. In practice, compare linear regression to a simple baseline (like predicting the mean) to confirm you are improving.

4) Evaluate with the right metrics

Common regression metrics include:

MAE (Mean Absolute Error): average absolute difference between prediction and truth.
RMSE (Root Mean Squared Error): similar to MAE, but penalizes large errors more.
R2 (R-squared): how much variance the model explains (useful, but not the only score).

5) Interpret and communicate results

Linear regression is popular becausethe coefficients are interpretable. A coefficient is easiest to explain in real units, such as: "an increase of 1 unit in x is associated with an increase of m units in y."

For decision-making, combine interpretation with error analysis and sanity checks. Not just a single score.

Assumptions worth checking

Linear regression can still be useful when assumptions are imperfect, but you will usually get more reliable results when these are approximately true:

The relationship is roughly linear in the range you care about.
Residuals do not show obvious patterns (a sign you are missing structure).
Residual spread is reasonably consistent across prediction values.
Features are not so correlated that coefficients become unstable (multicollinearity).

Common pitfalls

Extrapolation: predictions become risky outside your training range.
Correlation is not causation: a strong coefficient does not prove cause and effect.
Outliers: A small number of extreme points can pull the line in the wrong direction.
Non-linear patterns: if the relationship curves, consider transformations or other models.

Real-world examples

Ice cream sales vs. temperature: warmer days often correlate with higher sales.

Study time vs. exam score: More study time can correlate with better scores, with many factors involved.

House size vs. price: size may explain part of price, but location and condition often matter too.

Next steps

Once you are comfortable with linear regression, explore regularized variants. (like Ridge and Lasso) and feature engineering (like polynomial features).

If you want guided practice, explore Code Labs Academy's Data Science & AI Bootcamp or browse All bootcamps.