Anomaly Detection: Finding Outliers Without Labels

Anomaly detection finds rare events without labeled examples. Here is how Isolation Forest, One-Class SVM, and Autoencoders work -- and why accuracy is the wrong metric.

Mahmudul Haque Qudrati

CEO & ML Engineer

May 18, 2026

11 min read

// tags

#anomaly-detection#isolation-forest#autoencoder#unsupervised-learning#fraud-detection

FIG. ART-33

11 min read

“

Anomaly Detection: Finding Outliers Without Labels

// reading plan

sections

1,391

words

min read

// Machine Learning

Ensemble Methods: Why Combining Models Beats Any Individual Model

Bagging, boosting, and stacking -- ensemble methods consistently win Kaggle competitions and improve production accuracy. Here is how each works and when to use them.

9 min read

// Machine Learning

The ML Tools Ecosystem in 2026: A Map of What Is Worth Knowing

Most machine learning problems assume you have labeled examples of what you are trying to detect. Anomaly detection is different: you often cannot label anomalies in advance because you do not know what form they will take. You cannot collect examples of "all possible fraud patterns" before fraud happens. You cannot label "all possible equipment failure modes" before equipment fails. Anomaly detection finds what is unusual without being told what unusual looks like.

This unsupervised nature makes anomaly detection both powerful and tricky to evaluate correctly. The evaluation problem alone -- why accuracy is useless and what to use instead -- is where most practitioners go wrong.

What Makes Something an Anomaly

An anomaly is a data point that is significantly different from the rest of the data. This can mean:

Point anomalies: A single instance that is unusual in isolation. A transaction for $50,000 when the average is $85. A server responding in 30 seconds when the average is 200ms.

Contextual anomalies: An instance that is unusual given its context but not unusual in isolation. A temperature of 80 degrees Fahrenheit is normal in summer but anomalous in January. A transaction of $200 is normal on weekdays but unusual at 3am from a foreign IP.

Collective anomalies: A sequence of instances that is collectively unusual even though individual instances are not. A series of small transactions (each under $10) from a single card in rapid succession -- individually normal, collectively suspicious.

Different algorithms are better suited to different types of anomalies. Understanding which type you are detecting shapes your algorithm choice.

Isolation Forest

Isolation Forest is the most widely used general-purpose anomaly detection algorithm. The intuition is elegant: anomalies are rare and different, so they are easier to isolate than normal points.

The algorithm builds many random decision trees. At each node, it randomly selects a feature and a random split value. It partitions the data into left and right branches and repeats recursively. Normal points (which are dense and cluster with similar points) require many splits to isolate. Anomalous points (which are sparse and different from everything else) are isolated in very few splits.

The anomaly score for each point is the average path length across all trees to isolate that point. Short path length = anomalous. Long path length = normal.

from sklearn.ensemble import IsolationForest

model = IsolationForest(contamination=0.01, random_state=42)
# contamination: expected fraction of anomalies in the data
model.fit(X_train)
scores = model.decision_function(X_test)  # negative = more anomalous
predictions = model.predict(X_test)  # -1 = anomaly, 1 = normal

The contamination parameter controls the threshold -- the expected fraction of anomalies. If you set it to 0.01, the model will flag the 1% most anomalous points as anomalies. Setting this correctly requires domain knowledge.

Isolation Forest is fast (linear time complexity), scales to large datasets, handles high-dimensional data reasonably well, and requires no labeled anomalies. It is a strong first choice for most anomaly detection tasks.

One-Class SVM

One-Class SVM learns a boundary around the normal data in a high-dimensional feature space (using the kernel trick). Points outside this boundary are flagged as anomalies.

It learns to answer: "does this point look like it came from the same distribution as the training data?" If not, it is an anomaly.

from sklearn.svm import OneClassSVM

model = OneClassSVM(kernel='rbf', gamma='scale', nu=0.01)
# nu: upper bound on fraction of outliers in training data
model.fit(X_train)
predictions = model.predict(X_test)  # -1 = anomaly, 1 = normal

One-Class SVM is effective when anomalies are genuinely in a different region of feature space from normal points. It does not scale as well as Isolation Forest to large datasets (quadratic training time). The kernel and hyperparameters require careful tuning.

Autoencoder Reconstruction Error

Autoencoders are neural networks trained to compress their input into a low-dimensional representation (the bottleneck) and then reconstruct the original input from that compressed representation.

The key insight: if you train an autoencoder only on normal data, it learns to compress and reconstruct normal patterns efficiently. When you feed it an anomalous example, the autoencoder cannot reconstruct it well because it has never learned the patterns that produce that anomaly. The reconstruction error (difference between input and reconstructed output) is high for anomalies and low for normal examples.

import torch
import torch.nn as nn

class Autoencoder(nn.Module):
    def __init__(self, input_dim, bottleneck_dim):
        super().__init__()
        self.encoder = nn.Sequential(
            nn.Linear(input_dim, 64),
            nn.ReLU(),
            nn.Linear(64, bottleneck_dim)
        )
        self.decoder = nn.Sequential(
            nn.Linear(bottleneck_dim, 64),
            nn.ReLU(),
            nn.Linear(64, input_dim)
        )

    def forward(self, x):
        return self.decoder(self.encoder(x))

# Reconstruction error as anomaly score
reconstruction = model(x)
error = torch.mean((x - reconstruction) ** 2, dim=1)

Autoencoders work particularly well for complex, high-dimensional data where traditional algorithms struggle. They are commonly used for image anomaly detection (detecting manufacturing defects in product photos) and sequence anomaly detection (detecting unusual network traffic patterns).

The bottleneck size is a critical hyperparameter: too small and the autoencoder cannot reconstruct even normal data; too large and it can reconstruct anomalies too (because the bottleneck is not restrictive enough to force compression of only common patterns).

Why Accuracy is Useless: Use Precision-Recall

If 1% of your data is anomalous, a model that predicts "normal" for everything achieves 99% accuracy. This is the same imbalanced classification problem described in the metrics guide, and it is even more extreme for anomaly detection because anomaly rates are often 0.1% or lower.

The right metrics for anomaly detection:

Precision: Of the points flagged as anomalies, what fraction are actually anomalous? High precision means low false alarm rate.

Recall: Of all actual anomalies, what fraction were flagged? High recall means few missed anomalies.

Precision-Recall AUC: The area under the precision-recall curve across all thresholds. This is the standard evaluation metric for anomaly detection.

Average Precision (AP): A weighted mean of precision scores at each threshold, where each weight is the increase in recall from the previous threshold. Similar to PR-AUC but computed differently.

When you have labeled anomalies for evaluation (even if you cannot use them for training), use precision-recall metrics. When you have no labels at all, you must evaluate the system qualitatively: are the flagged anomalies actually interesting to human reviewers?

Use Case: Fraud Detection

Credit card fraud detection is the canonical anomaly detection application. Labels may exist historically, but new fraud patterns emerge constantly that no historical label covers. Anomaly detection identifies transactions that are unusual relative to that user's normal behavior, regardless of whether that specific fraud pattern was seen before.

Features for fraud detection: transaction amount relative to user's history, time of day, merchant category, geographic location relative to recent activity, frequency of transactions in a time window, velocity of spending.

Both Isolation Forest (for initial suspicious flagging) and autoencoders (for detecting novel fraud patterns) are commonly deployed in production fraud systems, often alongside supervised classifiers trained on historical labeled fraud.

Use Case: System Monitoring

Detecting unusual system behavior (CPU spikes, memory leaks, unusual request patterns, latency anomalies) is a strong fit for time series anomaly detection. Normal system behavior has regular patterns (daily traffic cycles, weekly patterns). Anomalies are deviations from these patterns.

For time series, reconstruction-error-based methods (LSTM autoencoders) or statistical methods (tracking rolling mean and standard deviation, flagging when a new value exceeds mean + 3*sigma) are common. The "seasonal decomposition" approach (separate trend, seasonality, and residual components; flag unusual residuals) works well for well-behaved time series.

Use Case: Quality Control

Manufacturing defect detection in images is a prime autoencoder use case. Train on thousands of images of defect-free products. Flag products whose reconstruction error exceeds a threshold as potentially defective. This works even for defect types never seen before, which is the key advantage over supervised classification.

Anomaly detection is not a replacement for supervised classification when you have labels -- supervised methods will typically achieve better performance when you have sufficient labeled examples. But for rare events, novel patterns, and cases where you cannot anticipate all failure modes in advance, anomaly detection fills a gap that supervised learning cannot.

Keep Reading

ML Model Evaluation Metrics Guide -- precision-recall tradeoffs explained in full detail
Machine Learning Complete Guide for Software Developers -- the broader ML landscape to contextualize where anomaly detection fits
Overfitting and Underfitting: How to Fix Them -- autoencoders overfit too, and the same remedies apply

Pristren builds AI-powered software for teams. Zlyqor is our all-in-one workspace -- chat, projects, time tracking, AI meeting summaries, and invoicing -- in one tool. Try it free.

Anomaly Detection: Finding Outliers Without Labels

Related Articles

Ensemble Methods: Why Combining Models Beats Any Individual Model

The ML Tools Ecosystem in 2026: A Map of What Is Worth Knowing

What Makes Something an Anomaly

Isolation Forest

One-Class SVM

Autoencoder Reconstruction Error

Why Accuracy is Useless: Use Precision-Recall

Use Case: Fraud Detection

Use Case: System Monitoring

Use Case: Quality Control

Keep Reading

The workspace your team
actually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

ML Research Papers Every Practitioner Should Know in 2026

Anomaly Detection: Finding Outliers Without Labels

Related Articles

Ensemble Methods: Why Combining Models Beats Any Individual Model

The ML Tools Ecosystem in 2026: A Map of What Is Worth Knowing

What Makes Something an Anomaly

Isolation Forest

One-Class SVM

Autoencoder Reconstruction Error

Why Accuracy is Useless: Use Precision-Recall

Use Case: Fraud Detection

Use Case: System Monitoring

Use Case: Quality Control

Keep Reading

The workspace your teamactually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

ML Research Papers Every Practitioner Should Know in 2026

The workspace your team
actually needs