L1 vs L2 Regularization: Prevent Overfitting in ML
Updated on January 30, 2026 5 minutes read
Regularization is one of the simplest ways to make machine learning models more reliable. When a model is too flexible, it can fit noise in the training data and then underperform on new examples.
In 2026 workflows, where teams often ship models quickly and retrain frequently, regularization is still a core tool. It matters for classic linear models and modern neural networks, and it pairs well with cross-validation and early stopping.
Why Overfitting Happens
Overfitting shows up when a model learns patterns that are specific to your training set. You will often see very strong training performance, while validation or test performance stalls or drops.
This gap typically grows when you have many features, limited data, or a highly expressive model. In those settings, the model can memorize instead of generalizing.
A quick diagnostic checklist
- Training score keeps improving while validation score plateaus
- Coefficients or weights become unusually large
- Small changes in training data produce big changes in the fitted model
None of these signals alone is perfect, but together they point to high variance.
Regularization, explained simply
Most training objectives minimize a loss that measures error on the training data. Regularization adds a second term that penalizes complexity:
Loss_reg = Loss + λ · Ω(w)
Here, w are model parameters, Ω(w) is the penalty, and λ controls the strength. Larger λ usually means a simpler model with less variance, but potentially more bias.
Bias-variance trade-off
Regularization does not make a model better by default. It changes the trade-off. You typically accept a small increase in bias in exchange for a larger decrease in variance, improving performance on unseen data.
That is why you rarely set λ by intuition alone. In practice, you tune it using a validation set or cross-validation.
L1 Regularization (Lasso): sparse, selective models
L1 regularization uses the sum of absolute parameter values:
Ω(w) = Σ |w_j|
For many linear models trained with an L1 penalty, this tends to drive some coefficients to exactly zero. The result is a sparser model that effectively performs feature selection.
When L1 is a good fit
- You suspect many features are irrelevant and want an automatic filter
- Interpretability matters, and you want fewer active signals
- You are working with high-dimensional data (many columns)
L1 can be especially helpful when you want a compact model that is easier to explain.
Trade-offs to know
L1 can be unstable when features are strongly correlated. It may keep one feature and drop another, even if both are meaningful. This is a tendency, not a guarantee, but it is common enough to plan for.
Because the penalty depends on the coefficient scale, feature scaling (for example, standardization) is also important.
L2 Regularization (Ridge): smooth shrinkage
L2 regularization uses the sum of squared parameters:
Ω(w) = Σ (w_j²)
Instead of zeroing weights, L2 typically shrinks them toward zero. This often produces models that are more stable when many features contribute small effects.
In deep learning, L2-style regularization is commonly referred to as weight decay. It is widely used to discourage very large weights during training.
When L2 is a good fit
- Many features might matter, but you want to reduce sensitivity to noise
- You have correlated inputs and want the coefficients to share influence
- You prioritize stable predictions over sparse explanations
L2 is a strong default when you do not want feature elimination.
Trade-offs to know
L2 will not usually produce a compact set of features on its own. If you need feature selection, you will typically pair it with other techniques or use Elastic Net.
Also note that if your features are on very different scales, L2 can over-penalize some directions. Scaling helps here, too.
Elastic Net: combining L1 and L2
Elastic Net mixes both penalties:
Loss_reg = Loss + λ1 · Σ|w_j| + λ2 · Σ(w_j²)
This can keep the sparsity benefits of L1 while improving stability in the presence of correlated features. It is often used when L1 feels too aggressive, but pure L2 does not simplify enough.
How to choose between L1, L2, and Elastic Net
The right choice depends on what you want the model to do, not just its accuracy. Start with the simplest option that matches your goal, then validate.
Quick decision guide
- Need feature selection or a smaller model? Start with L1 or Elastic Net.
- Want stable predictions with many small effects? Start with L2.
- Have correlated features and still want sparsity? Prefer Elastic Net.
In all cases, treat λ (and the L1/L2 mix, if applicable) as hyperparameters. Tune them with cross-validation, then confirm results on a held-out test set.
Practical tips that prevent common regularization mistakes
Regularization works best as part of a clean evaluation setup. These steps are simple, but they prevent misleading results.
- Scale your features before applying L1/L2 penalties (especially for linear models).
- Tune on validation data, not on the test set (avoid peeking).
- Report both training and validation metrics, so you can see the trade-off.
- Combine regularization with early stopping for iterative learners (including neural nets).
Small scikit-learn example
from sklearn.linear_model import Ridge, Lasso
from sklearn.model_selection import GridSearchCV
ridge = GridSearchCV(Ridge(), {"alpha": [0.1, 1.0, 10.0]}, cv=5)
lasso = GridSearchCV(Lasso(max_iter=10000), {"alpha": [0.001, 0.01, 0.1]}, cv=5)
The key idea is that you do not pick a regularization strength. You validate it.
Common pitfalls (and how to avoid them)
- Assuming a zero coefficient means a feature is useless. It may reflect collinearity or scaling issues.
- Regularizing as a substitute for good data. If labels are noisy or drifting, penalties cannot fix that alone.
- Forgetting to document λ (and feature scaling). Reproducibility matters when you retrain models.
Keep learning with Code Labs Academy
If you want to practice these ideas hands-on, explore Code Labs Academy’s Data Science & AI Bootcamp. You will work through projects where you tune models, compare validation results, and explain your choices.