What is overfitting and underfitting in ML?

Overfitting occurs when a model learns the training data too well, including noise, and fails to generalize to new data. Underfitting occurs when a model is too simple to capture the underlying patterns, resulting in poor performance on both training and validation data.

How does overfitting and underfitting work?

Overfitting happens when a model has high variance — it becomes too sensitive to the training data's idiosyncrasies. Underfitting happens when a model has high bias — it makes strong assumptions that don't hold. The loss curve (training vs. validation) reveals which is happening: diverging curves indicate overfitting; both high and flat indicate underfitting.

What are the best practices for diagnosing and fixing overfitting and underfitting?

Best practices include: always plot training vs. validation loss; for overfitting, use more data, dropout, L1/L2 regularization, early stopping, or simplify the model; for underfitting, increase model capacity, train longer, improve features, lower learning rate, or switch to a more suitable architecture.

How much does it cost to fix overfitting and underfitting?

Fixing overfitting and underfitting typically involves no direct monetary cost — it's about adjusting your training process. Techniques like dropout, regularization, early stopping, and feature engineering are free. The main cost is computational time for retraining, which is usually negligible for small to medium models.

Is overfitting and underfitting worth it in 2026?

Understanding and managing overfitting and underfitting remains essential in 2025 and beyond. As models grow larger and datasets become more complex, the ability to diagnose and correct these issues directly impacts model performance, reliability, and deployment success. It's a foundational skill for any ML practitioner.

Overfitting vs Underfitting in ML: How to Diagnose & Fix in 2025

Overfitting and underfitting are the two fundamental failure modes of machine learning models. Overfitting happens when your model learns the training data too well, including its noise and quirks, and fails to generalize to new examples. Underfitting happens when your model is too simple to capture the patterns in the data at all. The training versus validation loss curve is how you tell which problem you have, and the fixes are almost entirely different.

Understanding these two failure modes is foundational. Every technique in ML — regularization, dropout, early stopping, data augmentation, architecture selection — is ultimately a tool for managing this tradeoff.

Overfitting: The Model Memorizes Instead of Learning

Overfitting is when a model fits the training data so closely that it captures noise rather than signal. The classic example: a model trained to classify images of cats and dogs that perfectly classifies every training image, but fails on new images it has never seen. It did not learn what makes a cat a cat; it memorized which pixel patterns corresponded to which labels in the training set.

Symptoms of overfitting:

Training loss continues to decrease as training progresses
Validation loss decreases initially, then starts increasing (or stops decreasing)
A large gap between training accuracy and validation accuracy (e.g., 98% training, 72% validation)
The model gives very confident predictions on training examples and uncertain or wrong predictions on new examples

What causes it: the model has more capacity (parameters) than the data complexity warrants. A 100-million parameter model trained on 1,000 examples will overfit severely. The model effectively memorizes each training example.

A concrete example: fitting a polynomial to six data points. A polynomial with six parameters can pass through all six points exactly (zero training error). But the curve between the points oscillates wildly, making terrible predictions for new data. A simpler polynomial with three parameters misses some training points but captures the underlying trend much better.

Underfitting: The Model Is Too Simple

Underfitting happens when the model is not powerful enough to capture the patterns in the data. Both training and validation loss are high, and performance is poor on both.

Symptoms of underfitting:

High training loss that is not decreasing much with more training
Training and validation loss are similarly high (small gap, but both poor)
The model's predictions seem random or default to the most common class

What causes it: using a model that is too simple for the task (e.g., linear regression for a non-linear relationship), too few training epochs, too high a learning rate causing unstable training, or bad features that do not capture the relevant signal.

Diagnosing Which You Have: The Loss Curve

Plot training loss and validation loss on the same chart, with training steps or epochs on the x-axis and loss on the y-axis.

Overfitting pattern: training loss decreases consistently, validation loss decreases initially then flattens or increases. The two curves diverge. The divergence point is where overfitting begins.

Underfitting pattern: both curves are high and decrease slowly or plateau at a high value. The curves are close together but both at an unacceptable level.

Good fit pattern: both curves decrease and converge to low values. Validation loss is slightly higher than training loss (always expected), but the gap is small and stable.

Most ML libraries plot this automatically. In PyTorch:

import matplotlib.pyplot as plt

# Assume train_losses and val_losses are lists collected during training
plt.figure(figsize=(10, 6))
plt.plot(train_losses, label="Training Loss")
plt.plot(val_losses, label="Validation Loss")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.legend()
plt.title("Training vs Validation Loss")
plt.show()

Fixes for Overfitting

More Data

The most reliable fix. More training examples give the model more patterns to learn and make it harder to memorize noise. If doubling your dataset is feasible, try this first before any regularization technique.

Dropout

Dropout randomly sets a fraction of neuron outputs to zero during each training step. A common setting is 0.3 to 0.5 (30 to 50 percent of neurons disabled each step). This prevents any single neuron from becoming critical to the model's predictions, forcing the network to distribute knowledge across many neurons.

import torch.nn as nn

class SimpleNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 256)
        self.dropout = nn.Dropout(0.3)  # 30% dropout
        self.fc2 = nn.Linear(256, 10)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = self.dropout(x)  # applied during training only
        x = self.fc2(x)
        return x

Dropout is disabled during inference automatically.

L1 and L2 Regularization (Weight Decay)

Regularization adds a penalty to the loss function that discourages large weights. Large weights mean the model is very sensitive to specific input patterns, which is a sign of memorization.

L2 regularization (weight decay) adds the sum of squared weights to the loss, pushing all weights toward smaller values without eliminating them. L1 regularization adds the sum of absolute weights, which tends to drive some weights to exactly zero, creating a sparse model.

In most modern deep learning frameworks, L2 regularization is implemented as weight decay in the optimizer:

optimizer = torch.optim.Adam(model.parameters(), lr=1e-3, weight_decay=1e-4)

Early Stopping

Monitor validation loss during training. When it stops improving (for a patience period of, say, 10 epochs), stop training. Save the model checkpoint from the epoch with the best validation loss.

This prevents the model from continuing to train after it starts overfitting, even if training loss would keep decreasing.

Simpler Model Architecture

If regularization is insufficient, reduce model capacity. Fewer layers, fewer neurons per layer, or fewer parameters overall. The model literally has less capacity to memorize noise.

Fixes for Underfitting

More Model Capacity

Add layers or neurons. An underfitting model is constrained by its architecture. Increasing the number of parameters gives it more capacity to represent complex patterns.

More Training Time

Train for more epochs. Underfitting is often simply a matter of not running training long enough for the model to converge. Check the loss curve: if both curves are still decreasing, more training will help.

Better Features

If the features (inputs) do not capture the relevant signal, no amount of model capacity will help. Domain knowledge about which features matter can be more valuable than architectural changes.

Lower Learning Rate

If the learning rate is too high, training oscillates and fails to converge. Reducing the learning rate by a factor of 10 is often the first thing to try when training is unstable.

Better Architecture

For some data types, certain architectures are significantly better suited than others. For images, convolutional neural networks (CNNs) outperform fully connected networks. For sequential data, transformers or recurrent networks outperform fully connected ones. Using the wrong architecture type can cause persistent underfitting regardless of training time or capacity.

The Bias-Variance Tradeoff

The bias-variance tradeoff is the theoretical framework underlying overfitting and underfitting.

Bias is the error from incorrect assumptions in the learning algorithm. High bias means the model consistently misses the target (underfitting). Variance is the error from sensitivity to fluctuations in the training data. High variance means the model's predictions change dramatically with different training samples (overfitting).

Every design decision in ML involves this tradeoff. More complex models have lower bias but higher variance. Simpler models have higher bias but lower variance. Regularization reduces variance at the cost of slightly higher bias. The goal is finding the model complexity that minimizes total error on new data, which is the sum of bias and variance.

The practical approach: start simple. Check train vs. validation loss. If you are underfitting, increase complexity. If you are overfitting, apply regularization or get more data. Iterate.

Keep Reading

Neural Networks Explained: A Visual Guide for Software Developers — The foundational mechanics before tackling training failures
When Not to Use Machine Learning: Simpler Solutions That Actually Work — How to avoid the overfitting problem entirely for simple tasks
Building a RAG System From Scratch: A Complete Implementation Guide — A practical ML system that sidesteps training entirely

Pristren builds AI-powered software for teams. Zlyqor is our all-in-one workspace — chat, projects, time tracking, AI meeting summaries, and invoicing — in one tool. Try it free.

Overfitting and Underfitting in ML: How to Diagnose and Fix Both

Overfitting: The Model Memorizes Instead of Learning

Underfitting: The Model Is Too Simple

AI & ML insights, weekly

Mahmudul Haque Qudrati

Related Articles

ONNX: Export Any ML Model and Run It Anywhere

Supervised Learning Explained: How Models Learn from Labeled Examples

ML Model Evaluation Metrics: Why Accuracy Lies and What to Use Instead

Diagnosing Which You Have: The Loss Curve

Fixes for Overfitting

More Data

Dropout

L1 and L2 Regularization (Weight Decay)

Early Stopping

Simpler Model Architecture

Fixes for Underfitting

More Model Capacity

More Training Time

Better Features

Lower Learning Rate

Better Architecture

The Bias-Variance Tradeoff

Keep Reading

Frequently Asked Questions

What is overfitting and underfitting in ML?

How does overfitting and underfitting work?

What are the best practices for diagnosing and fixing overfitting and underfitting?

How much does it cost to fix overfitting and underfitting?

Is overfitting and underfitting worth it in 2026?

The workspace your team
actually needs

Overfitting and Underfitting in ML: How to Diagnose and Fix Both

Overfitting: The Model Memorizes Instead of Learning

Underfitting: The Model Is Too Simple

AI & ML insights, weekly

Mahmudul Haque Qudrati

Related Articles

ONNX: Export Any ML Model and Run It Anywhere

Supervised Learning Explained: How Models Learn from Labeled Examples

ML Model Evaluation Metrics: Why Accuracy Lies and What to Use Instead

Diagnosing Which You Have: The Loss Curve

Fixes for Overfitting

More Data

Dropout

L1 and L2 Regularization (Weight Decay)

Early Stopping

Simpler Model Architecture

Fixes for Underfitting

More Model Capacity

More Training Time

Better Features

Lower Learning Rate

Better Architecture

The Bias-Variance Tradeoff

Keep Reading

Frequently Asked Questions

What is overfitting and underfitting in ML?

How does overfitting and underfitting work?

What are the best practices for diagnosing and fixing overfitting and underfitting?

How much does it cost to fix overfitting and underfitting?

Is overfitting and underfitting worth it in 2026?

The workspace your teamactually needs

The workspace your team
actually needs