Why Experiment Tracking Matters
By experiment 50, you cannot remember which hyperparameters gave the best F1 score. By experiment 200, your team has run experiments you do not know about. By experiment 1,000, you need a model registry with stage transitions to know what is in production.
Neptune.ai solves the full tracking lifecycle from first experiment to production model.
Starting a Run
import neptune
import lightgbm as lgb
from sklearn.model_selection import cross_val_score
run = neptune.init_run(
project="my-team/fraud-detection",
api_token="YOUR_API_TOKEN",
name="lgbm-experiment-47",
tags=["lightgbm", "v2-features"],
)
# Log hyperparameters
params = {
"n_estimators": 500,
"num_leaves": 63,
"learning_rate": 0.05,
"feature_set": "v2",
}
run["parameters"] = params
# Train model
model = lgb.LGBMClassifier(**params)
scores = cross_val_score(model, X_train, y_train, cv=5, scoring="roc_auc")
# Log metrics
run["metrics/cv_auc_mean"] = scores.mean()
run["metrics/cv_auc_std"] = scores.std()
# Log artifacts
import joblib
joblib.dump(model, "model.pkl")
run["model_file"].upload("model.pkl")
run.stop()
Logging During Training
import neptune
from neptune.integrations.lightgbm import NeptuneCallback
run = neptune.init_run(project="my-team/fraud-detection")
# LightGBM integration — logs train/val metrics per epoch automatically
neptune_callback = NeptuneCallback(run=run, base_namespace="training")
model = lgb.train(
params,
train_data,
valid_sets=[train_data, val_data],
valid_names=["train", "val"],
callbacks=[neptune_callback, lgb.early_stopping(50)],
)
Neptune integrations exist for PyTorch Lightning, Keras, scikit-learn, XGBoost, and Optuna.
Model Registry with Stage Transitions
# Register a model
model_version = neptune.init_model_version(
model="FRA-MOD", # model ID created in UI
project="my-team/fraud-detection",
)
model_version["model"].upload("model.pkl")
model_version["metrics/auc"] = 0.943
model_version.change_stage("staging")
# After validation, promote to production
model_version.change_stage("production")
Stage transitions create an audit trail — who promoted which version and when.
Comparing Runs
Neptune's web UI allows comparing any two runs side-by-side: hyperparameters, metrics, artifacts, and even images (confusion matrices, SHAP plots). Filter runs by tags, metrics ranges, or custom metadata.
# Fetch run data programmatically
import neptune
run = neptune.init_run(
project="my-team/fraud-detection",
with_id="FRA-47",
mode="read-only",
)
print(run["metrics/cv_auc_mean"].fetch())
print(run["parameters"].fetch())
Neptune vs W&B vs MLflow
| | Neptune | W&B | MLflow | |---|---|---|---| | Best for | Team collaboration, private cloud | Deep learning, research | Self-hosted, enterprise | | Free tier | 200GB, unlimited users | 100GB | Self-hosted free | | Private cloud | Yes | Enterprise | Yes (self-host) | | UI quality | Excellent | Excellent | Good | | Model registry | Yes | Yes | Yes |
Resources: Neptune.ai, docs, integrations.