Dimensionality Reduction: PCA, t-SNE, and UMAP Explained

High-dimensional data is hard to work with. PCA, t-SNE, and UMAP each reduce it differently. Here is when to use each and how to avoid the curse of dimensionality.

Mahmudul Haque Qudrati

CEO & ML Engineer

May 18, 2026

10 min read

// tags

#dimensionality-reduction#pca#t-sne#umap#machine-learning#visualization

FIG. ART-30

10 min read

“

Dimensionality Reduction: PCA, t-SNE, and UMAP Explained

// reading plan

sections

1,254

words

min read

// Machine Learning

Ensemble Methods: Why Combining Models Beats Any Individual Model

Bagging, boosting, and stacking -- ensemble methods consistently win Kaggle competitions and improve production accuracy. Here is how each works and when to use them.

9 min read

// Machine Learning

The ML Tools Ecosystem in 2026: A Map of What Is Worth Knowing

Modern datasets often have hundreds or thousands of features. A dataset of customer behavior might have 500 numeric features. A dataset of text documents represented as word counts might have 50,000 features. An image dataset, even at modest resolution, has thousands of pixel features per image.

High-dimensional data creates problems: visualization becomes impossible, distance metrics become unreliable, algorithms slow down, and overfitting becomes more likely. Dimensionality reduction addresses these problems by finding a lower-dimensional representation that preserves the structure that matters.

The Curse of Dimensionality

Before the algorithms, the problem they solve: in high-dimensional spaces, our geometric intuitions break down completely.

In two dimensions, a random point inside a unit square is almost never near the edge. In 100 dimensions, almost all points are near the edge. Volume concentrates in shells, not cores.

Distance metrics suffer: in high dimensions, the maximum and minimum distances between random points become nearly equal. Everything is approximately equidistant from everything else. This kills algorithms that rely on distance-based similarity (k-nearest neighbors, k-means clustering, kernel SVMs) because "near" and "far" lose meaning.

A related problem: with 1,000 features and 1,000 training examples, every model has more degrees of freedom than data points. Overfitting is nearly guaranteed without heavy regularization.

Dimensionality reduction addresses all of these by finding a low-dimensional representation (typically 2, 3, 10, or 50 dimensions) that preserves the structure relevant to your task.

Principal Component Analysis (PCA): Linear Projection

PCA is the simplest and most widely used dimensionality reduction method. It finds the directions in feature space that explain the most variance in the data, then projects the data onto those directions.

Concretely: the first principal component (PC1) is the direction along which the data varies most. PC2 is the direction of second-highest variance that is orthogonal (perpendicular) to PC1. And so on.

from sklearn.decomposition import PCA
import numpy as np

pca = PCA(n_components=50)  # Reduce to 50 dimensions
X_reduced = pca.fit_transform(X)

# How much variance is explained?
print(pca.explained_variance_ratio_.cumsum())
# [0.12, 0.23, 0.33, ..., 0.95]
# 50 components explain 95% of the variance

The explained variance ratio tells you how much information you retained. If 50 components explain 95% of variance in a 500-feature dataset, you have compressed 90% of the features while retaining 95% of the information.

PCA use cases:

Preprocessing before feeding data to another algorithm (reduces dimensions, speeds up training, reduces overfitting)
Visualizing data in 2D or 3D (plot PC1 vs. PC2 to see structure)
Noise reduction (low-variance components often capture noise; dropping them removes noise)
Feature compression for storage or transmission

PCA limitations: PCA is linear. It can only find linear structure. If your data lies on a curved manifold in high-dimensional space, PCA cannot capture that structure. Two clusters that are linearly overlapping but separable non-linearly will look mixed in a PCA projection.

When to use PCA:

General-purpose preprocessing for ML pipelines
When you need a fast, deterministic, reproducible reduction
When linear relationships are sufficient for your downstream task
When interpretability matters (principal components are linear combinations of original features)

t-SNE: Non-Linear Visualization

t-SNE (t-distributed Stochastic Neighbor Embedding) is the standard tool for visualizing high-dimensional data in 2D or 3D. It captures non-linear structure that PCA misses.

The algorithm works by: computing pairwise similarities between all points in high-dimensional space (nearby points get high similarity, far points get near-zero similarity), then placing points randomly in 2D space and iteratively adjusting their positions to match the high-dimensional similarity structure as closely as possible.

from sklearn.manifold import TSNE

tsne = TSNE(n_components=2, perplexity=30, random_state=42)
X_2d = tsne.fit_transform(X)

import matplotlib.pyplot as plt
plt.scatter(X_2d[:, 0], X_2d[:, 1], c=labels, cmap='tab10')
plt.title('t-SNE visualization')
plt.show()

t-SNE excels at revealing cluster structure. If your data has natural groupings, t-SNE will make them visually obvious. This is why t-SNE plots are ubiquitous in ML papers for showing that learned representations capture semantically meaningful structure.

t-SNE limitations:

Slow for large datasets (quadratic complexity -- 10,000 points takes minutes, 100,000 points takes hours)
Stochastic: different runs produce different layouts
Perplexity hyperparameter (typically 5-50) strongly affects the output
Does not preserve global structure well: clusters may be positioned arbitrarily relative to each other
Not suitable for preprocessing ML pipelines (non-parametric, cannot transform new data without re-running on the full dataset)

When to use t-SNE:

Visualizing clusters and structure in data for exploration
Creating diagnostic plots in papers or presentations
Datasets up to ~50,000 points with compute budget for minutes of runtime

UMAP: Faster and Better Global Structure

UMAP (Uniform Manifold Approximation and Projection) is a newer method that addresses t-SNE's main weaknesses: speed and global structure preservation.

import umap

reducer = umap.UMAP(n_components=2, n_neighbors=15, min_dist=0.1, random_state=42)
X_2d = reducer.fit_transform(X)

# UMAP is parametric: transform new data without re-running
X_new_2d = reducer.transform(X_new)

UMAP is dramatically faster than t-SNE (10-100x on large datasets) because it uses approximate nearest neighbors rather than exact pairwise distances. It also preserves global structure better: clusters that are genuinely far apart in high-dimensional space will be far apart in the UMAP projection.

UMAP key hyperparameters:

n_neighbors: controls local vs. global structure balance. Small (5-15): focuses on local structure, tighter clusters. Large (50-200): preserves more global structure.
min_dist: controls how tightly points cluster in the projection. Small (0.0-0.1): tight clusters. Large (0.5-1.0): spread-out, more uniform distribution.

UMAP advantages over t-SNE:

10-100x faster, scales to millions of points
Better global structure preservation
Parametric: can transform new data without re-running
More reproducible (though still has randomness)
Also useful for preprocessing (reduce to 50 dimensions before clustering)

When to use UMAP:

Large datasets where t-SNE is too slow
When you need to transform new data after fitting (production use cases)
When global structure matters (understanding relationships between clusters)
As a preprocessing step before clustering (HDBSCAN + UMAP is a powerful combination)

Practical Decision Framework

For preprocessing before ML algorithms: Use PCA. It is fast, deterministic, preserves linear structure, and compresses data efficiently. Reduce to a number of components that explain 90-95% of variance.

For 2D/3D visualization of small datasets (under ~50K): Use t-SNE. It produces the most visually clear cluster separations. Tune perplexity between 5 and 50.

For 2D/3D visualization of large datasets: Use UMAP. Faster and preserves global structure better.

For preprocessing before clustering: Use UMAP with n_components=10-50. Running HDBSCAN or k-means on UMAP-reduced data often produces better clusters than running directly on high-dimensional data.

For interpretability: Use PCA. Principal components are linear combinations of original features and can be inspected to understand what patterns they capture.

scikit-learn Practical Examples

from sklearn.pipeline import Pipeline
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestClassifier

# PCA as preprocessing in a pipeline
pipeline = Pipeline([
    ('pca', PCA(n_components=50)),
    ('classifier', RandomForestClassifier(n_estimators=100))
])
pipeline.fit(X_train, y_train)

Dimensionality reduction is not always necessary. If your dataset has 20 features and 100,000 examples, you are unlikely to benefit from it. The curse of dimensionality becomes practically relevant above roughly 100 features, or when your feature count exceeds your sample count. In these situations, PCA as a preprocessing step is almost always worth trying.

Keep Reading

Machine Learning Complete Guide for Software Developers -- how dimensionality reduction fits in the broader ML pipeline
Anomaly Detection Practical Guide -- UMAP is commonly used to visualize anomaly detection results
Feature Engineering Practical Guide -- dimensionality reduction complements feature engineering for high-dimensional problems

Pristren builds AI-powered software for teams. Zlyqor is our all-in-one workspace -- chat, projects, time tracking, AI meeting summaries, and invoicing -- in one tool. Try it free.

Dimensionality Reduction: PCA, t-SNE, and UMAP Explained

Related Articles

Ensemble Methods: Why Combining Models Beats Any Individual Model

The ML Tools Ecosystem in 2026: A Map of What Is Worth Knowing

The Curse of Dimensionality

Principal Component Analysis (PCA): Linear Projection

t-SNE: Non-Linear Visualization

UMAP: Faster and Better Global Structure

Practical Decision Framework

scikit-learn Practical Examples

Keep Reading

The workspace your team
actually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

ML Research Papers Every Practitioner Should Know in 2026

Dimensionality Reduction: PCA, t-SNE, and UMAP Explained

Related Articles

Ensemble Methods: Why Combining Models Beats Any Individual Model

The ML Tools Ecosystem in 2026: A Map of What Is Worth Knowing

The Curse of Dimensionality

Principal Component Analysis (PCA): Linear Projection

t-SNE: Non-Linear Visualization

UMAP: Faster and Better Global Structure

Practical Decision Framework

scikit-learn Practical Examples

Keep Reading

The workspace your teamactually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

ML Research Papers Every Practitioner Should Know in 2026

The workspace your team
actually needs