ONNX: Export Any ML Model and Run It Anywhere

ONNX (Open Neural Network Exchange) is the universal model format - export from PyTorch, scikit-learn, or HuggingFace and run 3x faster inference with ONNX Runtime on CPU or GPU.

Mahmudul Haque Qudrati

CEO & ML Engineer

May 19, 2026

7 min read

// tags

#onnx#model-export#inference#cross-platform#optimization

FIG. ART-27

7 min read

“

ONNX: Export Any ML Model and Run It Anywhere

// reading plan

sections

369

words

min read

// Machine Learning

Supervised Learning Explained: How Models Learn from Labeled Examples

Supervised learning is the most widely used ML paradigm. Here is exactly how the train-measure-adjust loop works, where labels come from, and when the approach breaks down.

10 min read

// Machine Learning

ML Model Evaluation Metrics: Why Accuracy Lies and What to Use Instead

Exporting from Scikit-learn

from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from sklearn.ensemble import RandomForestClassifier
import onnx

model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)

initial_type = [("float_input", FloatTensorType([None, X_train.shape[1]]))]
onnx_model = convert_sklearn(model, initial_types=initial_type)

with open("sklearn_model.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

Exporting HuggingFace Transformers with Optimum

pip install optimum[onnxruntime]
optimum-cli export onnx --model bert-base-uncased --task text-classification ./bert_onnx/

from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = ORTModelForSequenceClassification.from_pretrained("./bert_onnx/")

inputs = tokenizer("This is great!", return_tensors="pt")
outputs = model(**inputs)

ONNX Runtime for Fast Inference

import onnxruntime as ort
import numpy as np

# CPU inference session
session = ort.InferenceSession("model.onnx", providers=["CPUExecutionProvider"])

# GPU inference (CUDA)
session = ort.InferenceSession(
    "model.onnx",
    providers=["CUDAExecutionProvider", "CPUExecutionProvider"]
)

# Run inference
input_name = session.get_inputs()[0].name
output_name = session.get_outputs()[0].name

result = session.run(
    [output_name],
    {input_name: np.random.randn(1, 3, 224, 224).astype(np.float32)}
)[0]

ONNX Runtime is typically 2-5x faster than PyTorch for CPU inference due to graph optimizations and kernel fusion.

Quantization for Smaller, Faster Models

from onnxruntime.quantization import quantize_dynamic, QuantType

quantize_dynamic(
    "model.onnx",
    "model_quantized.onnx",
    weight_type=QuantType.QInt8,
)

# Result: ~75% smaller model, 2-4x faster CPU inference, <1% accuracy loss

ONNX vs TorchScript vs TensorFlow SavedModel

	ONNX	TorchScript	TF SavedModel
Framework support	All major	PyTorch only	TensorFlow only
Deployment targets	Universal	PyTorch ecosystem	TF ecosystem
Mobile support	Yes (Runtime Mobile)	Yes (iOS via LibTorch)	Yes (TF Lite)
Quantization	Excellent	Limited	Good (TF Lite)
Ecosystem	Growing fast	Stable	Mature

Resources: ONNX, ONNX Runtime, Optimum.

ONNX: Export Any ML Model and Run It Anywhere

Related Articles

Supervised Learning Explained: How Models Learn from Labeled Examples

What ONNX Solves

Exporting from PyTorch

Exporting from Scikit-learn

Exporting HuggingFace Transformers with Optimum

ONNX Runtime for Fast Inference

Quantization for Smaller, Faster Models

ONNX vs TorchScript vs TensorFlow SavedModel

The workspace your team
actually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

ML Model Evaluation Metrics: Why Accuracy Lies and What to Use Instead

Transfer Learning Explained: Reusing What Neural Networks Already Know

ONNX: Export Any ML Model and Run It Anywhere

Related Articles

Supervised Learning Explained: How Models Learn from Labeled Examples

What ONNX Solves

Exporting from PyTorch

Exporting from Scikit-learn

Exporting HuggingFace Transformers with Optimum

ONNX Runtime for Fast Inference

Quantization for Smaller, Faster Models

ONNX vs TorchScript vs TensorFlow SavedModel

The workspace your teamactually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

ML Model Evaluation Metrics: Why Accuracy Lies and What to Use Instead

Transfer Learning Explained: Reusing What Neural Networks Already Know

The workspace your team
actually needs