The Problem a Feature Store Solves
Three problems plague ML feature pipelines without a feature store:
Training-serving skew. Features are computed differently at training time (batch job) and serving time (real-time API). The model was never trained on what it sees in production.
Duplicate feature computation. Ten models compute the same "user purchase count last 7 days" in ten different pipelines.
Point-in-time correctness. When generating training data, you need the feature values as they were at prediction time — not today's values retroactively applied to historical labels.
Feast solves all three.
Core Concepts
Feature View: defines a set of features from a data source.
Offline Store: historical feature data for training (Parquet files, BigQuery, Snowflake, Redshift).
Online Store: low-latency feature data for serving (Redis, DynamoDB, SQLite for dev).
Feature Service: a named set of features consumed by a model.
Defining Feature Views
from datetime import timedelta
from feast import Entity, FeatureView, Field, FileSource
from feast.types import Float64, Int64, String
# Define the entity (what your features describe)
user = Entity(name="user", join_keys=["user_id"])
# Define a data source
user_stats_source = FileSource(
path="data/user_stats.parquet",
timestamp_field="event_timestamp",
)
# Define the feature view
user_stats_fv = FeatureView(
name="user_stats",
entities=[user],
ttl=timedelta(days=7),
schema=[
Field(name="purchase_count_7d", dtype=Int64),
Field(name="avg_order_value_30d", dtype=Float64),
Field(name="days_since_last_login", dtype=Int64),
Field(name="country", dtype=String),
],
online=True,
source=user_stats_source,
)
Getting Historical Features for Training
from feast import FeatureStore
import pandas as pd
store = FeatureStore(repo_path=".")
# Entity DataFrame: who you want features for and at what time
entity_df = pd.DataFrame({
"user_id": [1001, 1002, 1003],
"event_timestamp": pd.to_datetime(["2025-03-01", "2025-03-15", "2025-04-01"]),
"label": [1, 0, 1], # the target variable
})
# Point-in-time correct feature retrieval
training_df = store.get_historical_features(
entity_df=entity_df,
features=[
"user_stats:purchase_count_7d",
"user_stats:avg_order_value_30d",
"user_stats:country",
],
).to_df()
print(training_df.head())
get_historical_features() returns the feature values as they were at each event_timestamp — eliminating data leakage.
Materializing to Online Store and Serving
# Push features to online store (Redis/DynamoDB)
feast materialize-incremental $(date +%Y-%m-%dT%H:%M:%S)
# In your serving API
from feast import FeatureStore
store = FeatureStore(repo_path=".")
def predict(user_id: int) -> float:
features = store.get_online_features(
features=["user_stats:purchase_count_7d", "user_stats:avg_order_value_30d"],
entity_rows=[{"user_id": user_id}],
).to_dict()
X = [[features["purchase_count_7d"][0], features["avg_order_value_30d"][0]]]
return model.predict_proba(X)[0][1]
Latency for online retrieval from Redis: typically 1-5ms.
Feast vs Tecton
Feast is fully open source, self-managed, and free. Tecton is a managed feature platform with a SaaS model, better UI, and enterprise support. For teams starting out, Feast covers all the fundamentals. For organizations with 50+ ML engineers and SLA requirements, Tecton's managed infrastructure is worth evaluating.
Resources: Feast GitHub, docs.