Overfitting and Underfitting in Machine Learning
Updated on January 30, 2026 6 minutes read
Machine learning models do not fail only because of “bad algorithms”. Many real-world issues come down to generalization: how well a model performs on new, unseen data compared to the data it was trained on.
Two classic failure modes explain most generalization problems. Overfitting means the model learns the training set too well. Underfitting means it does not learn enough to be useful.
What overfitting and underfitting mean
Overfitting and underfitting describe a mismatch between model complexity, data quality, and the signal you want to learn. You usually see it as a gap between training and validation or test results.
Fixing the issue starts with a clear diagnosis. If you change the model before you confirm the failure mode, You can waste time tuning settings that do not address the root cause.
Overfitting
Overfitting happens when a model captures patterns that are specific to the training set, including noise and random fluctuations. It can look excellent during training, then drop sharply on new data.
Overfitting is often a high-variance problem. Small changes in the training split can cause big changes in metrics and even in the model’s predictions.
Common signs of overfitting:
- Very strong performance on training data, but weaker results on validation or test data
- Metrics that vary widely across folds or across different random seeds
- Predictions that are unstable when inputs change slightly
Underfitting
Underfitting happens when the model is too simple to capture relationships in the data. It performs poorly on both training and unseen data because it misses the signal.
Underfitting is typically a high-bias problem. The model makes overly simple assumptions and cannot represent the true pattern, even if you train longer.
Common signs of underfitting:
- Low performance on both training and validation or test data
- Learning curves that plateau early and do not meaningfully improve
- Residual errors with clear structure (a hint that important patterns were missed)
Why this happens: the bias-variance trade-off
Most model choices sit on a spectrum between bias and variance. If you push complexity too high, you often reduce bias but increase variance, which raises the risk of overfitting.
If you simplify too much, variance drops but bias rises. That increases the risk of underfitting, where the model cannot learn the task.
Your goal is balance. Pick the simplest approach that captures the signal reliably and remains stable across new data.
How to diagnose the problem
Before changing algorithms, confirm where the model is failing. A quick diagnosis can save hours of tuning that cannot fix a data or split issue. Start by separating training behavior from generalization behavior.
1) Compare training vs validation performance
If training scores are high but validation or test scores are much lower, Overfitting is likely. If both are low, underfitting or dataset quality issues are more likely.
If both are unusually high, treat it as a potential data leakage warning. Leakage can make results look great while hiding real performance issues that appear later in production.
2) Plot learning curves
Learning curves show how performance changes with more data or more training. They help you decide whether to collect more data, increase model capacity, or add regularization.
Typical patterns:
- Overfitting: training improves, validation stalls or worsens
- Underfitting: both curves are poor and close together
- Data limitation: both improve with more data, but validation lags training
3) Check for “silent” dataset problems
Some issues look like overfitting or underfitting, but are caused by the dataset. They often come from how data was split, labeled, or processed. Catching them early is one of the highest-impact steps you can take.
Watch for:
- Leakage from the target into features (directly or through proxies)
- Splits that are not representative (especially in time series or grouped data)
- Label noise, inconsistent annotation, or heavy class imbalance
- Different preprocessing between training and inference pipelines
How to reduce overfitting
Overfitting rarely improves with a single change. It usually gets better when you combine stronger evaluation, simpler modeling, and appropriate regularization.
The goal is to reduce variance without destroying the useful signal.
Improve evaluation and data hygiene
- Use a proper split and keep a true test set untouched until the end
- Use cross-validation, especially for smaller datasets or high-variance models
- Prevent leakage by auditing features, timestamps, IDs, and preprocessing
- Validate on data that resembles what you will see after launch
Reduce effective model complexity
- Feature selection: keep only features that add a measurable signal
- Dimensionality reduction: methods like PCA can help with correlated features
- Simplify the model when data is limited
- Constrain tree models (depth, minimum samples per leaf, and related settings)
Add regularization and training controls
- L1 or L2 regularization: penalize overly complex solutions
- Dropout (neural networks): encourages more robust representations
- Early stopping: stop when validation performance stops improving
- Data augmentation: for vision, audio, or text, create safe input variations
When “more data” is the best fix
If your dataset is small or narrow, complexity becomes risky. Adding diverse, representative data often improves generalization more than hyperparameter tuning alone.
Even modest increases in coverage can reduce variance. Focus on variety that matches real-world conditions, not just volume.
How to reduce underfitting
Underfitting is a sign that the model cannot express what the task requires. Fixes usually involve increasing capacity, relaxing constraints, or improving features.
Aim to add a usable signal without introducing leakage.
Increase capacity or flexibility
- Use a more expressive model (often non-linear instead of linear)
- Increase model size carefully (more trees, deeper networks, additional interactions)
- Train longer if optimization has not converged
Reduce constraints that are too strong
- Lower regularization strength if it is suppressing a useful signal
- Relax restrictive hyperparameters (for example, a max depth set too low)
- Revisit aggressive feature reduction that removed important information
Improve the inputs
- Add better features (domain signals often matter more than algorithms)
- Handle missing values and outliers consistently
- Verify labels match the decision you want the model to learn
A quick checklist before you ship
This checklist helps catch common causes of poor generalization. It also makes it easier to communicate model readiness to your team. Run it before you declare a model “done”.
- Metrics are measured on a split that matches production reality
- Strong baselines are included (simple models, majority class, or rule-based checks)
- Variance is measured (multiple folds or multiple random seeds)
- Leakage risks are reviewed (IDs, timestamps, target proxies)
- Final evaluation is done once on a locked test set
Learn machine learning with Code Labs Academy
If you want guided practice with these concepts, explore Code Labs Academy’s Data Science and AI program. It is designed to strengthen fundamentals through structured learning and hands-on projects.
Create tomorrow’s AI-driven technologies today: gain hands-on experience with Code Labs Academy’s online coding bootcamp.