A feature store is a centralized system for storing, computing, and serving machine learning features. The concept sounds straightforward but the problem it solves is subtle, and most teams adopt feature stores either too early (before the problem exists) or too late (after training-serving skew has caused silent model failures). This guide gives you the honest picture.
The Problem: Training-Serving Skew
Training-serving skew is when the features used to train a model are computed differently from the features computed at inference time. It is one of the most common sources of unexplained model degradation in production ML.
Here is how it happens. During training, a data scientist computes the feature "average purchase value in the last 30 days" from a historical transactions table in the data warehouse using a SQL query in a Jupyter notebook. At serving time, an engineer computes the same feature from a live database using a different code path. The logic seems identical, but:
- The data scientist averaged over completed orders; the engineer included pending orders
- The data scientist excluded refunds; the engineer did not
- The data scientist used UTC timestamps; the engineer used local time
The model sees features at serving time that look nothing like what it was trained on. Performance degrades silently.
The feature store's core promise is: the same feature definition computes the same values for both training and serving.
Components of a Feature Store
Feature registry. A catalog of all features: name, description, data type, owner, which models use it, when it was last updated. The registry solves the discovery problem -- when a new team starts a project, they can find features that already exist rather than recomputing them.
Offline store. Historical feature values stored in a data warehouse or object storage (Parquet files). Used for generating training datasets. You query the offline store with a list of entity IDs and timestamps; it returns the feature values that would have been available at each timestamp (point-in-time correct lookups). This is more complex than it sounds and is one of the main reasons feature stores exist.
Online store. A low-latency key-value store (Redis, DynamoDB, Bigtable) that serves the most recent feature values at inference time. Query: "give me the features for user ID 12345 right now." Response time must be under 10ms for real-time ML serving.
Feature computation layer. The code that computes features from raw data and writes to both online and offline stores. Most feature stores let you define feature computation in Python or SQL, then handle the materialization to both stores.
Tools
Feast (open source) is the most widely used open-source feature store. It runs on your existing infrastructure (offline store: BigQuery, Snowflake, or files; online store: Redis, DynamoDB, SQLite for testing). You define features in Python, run materialization jobs to populate both stores, and query via the Feast SDK.
from feast import FeatureStore
store = FeatureStore(repo_path=".")
# Get training data (historical, point-in-time correct)
training_df = store.get_historical_features(
entity_df=entity_df_with_timestamps,
features=["user_features:avg_purchase_30d", "user_features:purchase_count_7d"]
).to_df()
# Get online features for serving
online_features = store.get_online_features(
features=["user_features:avg_purchase_30d"],
entity_rows=[{"user_id": 1001}, {"user_id": 1002}]
).to_dict()
Tecton is a fully managed feature store with enterprise support. It handles feature computation, scheduling, monitoring, and serving as a service. Higher cost but significantly less operational burden than running Feast yourself.
Vertex AI Feature Store (Google Cloud) integrates natively with BigQuery and Vertex AI Pipelines. Use it if you are already committed to the Google Cloud ML stack.
Hopsworks is open source with a managed cloud option. Strong support for streaming features.
Point-in-Time Correct Lookups
This is the feature that separates a proper feature store from a naive database. When building a training dataset, you need the feature values that would have been available at the time of each training example -- not the current values.
Suppose you are building a fraud detection model. Each training example is a transaction. For each transaction, you want the feature "number of transactions by this user in the last hour." But "the last hour" means the hour before the transaction, not the current time. If you join the transactions table naively, you will include data from after the fraud event occurred (data leakage), and your model will appear to perform far better than it actually does.
Point-in-time correct lookups take an entity (user, transaction, product) and a timestamp, and return the feature values computed using only data available before that timestamp. This is hard to implement correctly in SQL and is a core feature of all mature feature stores.
When You Need a Feature Store
You need a feature store when all of the following are true:
-
You have multiple models sharing features. If every model computes its own version of "user engagement in the last 30 days," you have inconsistency and wasted computation. A feature store centralizes this.
-
You need online-offline consistency. If your models serve in real time and training-serving skew has caused unexplained performance drops, a feature store enforces consistency.
-
Features are expensive to compute. Aggregating 90 days of transaction history for 10 million users on every training run is wasteful. A feature store materializes features once and serves them repeatedly.
-
Multiple teams are building ML systems. A feature registry lets teams discover and reuse each other's work instead of recomputing.
When You Do Not Need a Feature Store
You do not need a feature store when:
- You have a single model or a small number of models, each with simple, non-shared features
- Your models are batch-only (no real-time inference) -- training-serving skew cannot occur if serving uses the same pipeline as training
- Your team is fewer than 5 data scientists -- the operational overhead of a feature store is not justified
- Features are simple transformations (log, scale, one-hot encode) that are trivially applied identically in both training and serving
For most early-stage companies and small teams, a disciplined approach to feature computation (shared utility functions, good documentation, code review) is sufficient. Introduce a feature store when the discipline breaks down at scale.
A Practical Migration Path
If you are experiencing training-serving skew but are not ready for a full feature store, an intermediate solution is to centralize feature computation in shared Python modules that are imported by both your training pipeline and your serving code.
# features/user_features.py -- shared by training and serving
def compute_avg_purchase_30d(transactions_df: pd.DataFrame, cutoff_time: datetime) -> float:
recent = transactions_df[
(transactions_df["completed_at"] < cutoff_time) &
(transactions_df["completed_at"] >= cutoff_time - timedelta(days=30)) &
(transactions_df["status"] == "completed")
]
return recent["amount_usd"].mean() if len(recent) > 0 else 0.0
This does not give you an online store or materialization, but it eliminates the most common source of skew: divergent implementations.
Keep Reading
- Data Pipeline Guide — pipelines that feed feature stores
- Machine Learning Complete Guide for Software Developers — where features fit in the ML lifecycle
- ML Model Evaluation Metrics Guide — measuring whether your features are working
Pristren builds AI-powered software for teams. Zlyqor is our all-in-one workspace — chat, projects, time tracking, AI meeting summaries, and invoicing — in one tool. Try it free.