Feature Engineering: The Practical Guide to Transforming Raw Data into ML Inputs

Feature engineering is where most ML project time actually goes. Here is how to do log transforms, one-hot encoding, cyclical encoding, and interaction features that move the needle.

Mahmudul Haque Qudrati

CEO & ML Engineer

May 18, 2026

10 min read

// tags

#feature-engineering#data-preprocessing#one-hot-encoding#machine-learning#tabular-data

FIG. ART-35

10 min read

“

Feature Engineering: The Practical Guide to Transforming Raw Data into ML Inputs

// reading plan

sections

1,282

words

min read

// Developer Tools

How to Get Started with Computer Vision as a Developer?

A hands-on guide for developers entering computer vision: pick the right library, write your first pipeline, and avoid common pitfalls.

4 min read

// Machine Learning

ONNX: Export Any ML Model and Run It Anywhere

One-Hot Encoding: Handling Categorical Variables

Tree models can sometimes handle raw categorical variables, but most ML algorithms require numeric inputs. The standard approach for nominal categories (categories with no meaningful order) is one-hot encoding: create a binary column for each category value.

"Country: [USA, UK, Canada]" becomes three binary columns: is_usa, is_uk, is_canada. Each row has exactly one "1" and the rest are "0".

df_encoded = pd.get_dummies(df, columns=['country'], drop_first=True)

The drop_first=True parameter drops one category column to avoid perfect multicollinearity (since the dropped category is implied when all others are 0). This matters for linear models; tree models are unaffected.

Pitfall: high-cardinality categoricals (user IDs, zip codes, product SKUs with thousands of unique values) produce enormous feature matrices and often do not encode the right information anyway. For high-cardinality categoricals, prefer target encoding (replace each category with the mean target value for that category, computed on training data only to avoid leakage) or embedding layers in neural networks.

Cyclical Encoding: Handling Time-Based Features

Hour of day, day of week, month of year, and angle measurements are cyclical: hour 23 is adjacent to hour 0, not far from it. One-hot encoding handles this correctly (treats each as a separate category) but discards the cyclical structure. Raw numeric encoding (hour as a number 0-23) tells the model that hour 23 is far from hour 0.

Cyclical encoding using sine and cosine preserves the cyclical structure:

df['hour_sin'] = np.sin(2 * np.pi * df['hour'] / 24)
df['hour_cos'] = np.cos(2 * np.pi * df['hour'] / 24)

Now the model receives two features that together encode both the time and its cyclical relationship with other times. Hour 23 and hour 0 will be close in this representation.

Apply cyclical encoding to: hour of day (period 24), day of week (period 7), day of year (period 365), month (period 12), compass bearing (period 360).

Interaction Features: Encoding Multiplicative Relationships

Many real-world relationships are multiplicative rather than additive. Revenue = price * quantity. Click rate = clicks / impressions. Churn risk might depend on the interaction between tenure and usage level, not either alone.

Interaction features encode these relationships explicitly:

df['revenue'] = df['price'] * df['quantity']
df['ctr'] = df['clicks'] / (df['impressions'] + 1)  # +1 avoids division by zero
df['tenure_x_usage'] = df['tenure_months'] * df['weekly_active_days']

Tree models can discover interactions by splitting on both features in sequence, but explicit interaction features reduce the tree depth required to capture them and help linear models that cannot discover interactions at all.

The challenge: with 50 features, there are 1,225 possible pairwise interactions. You cannot try them all blindly. Use domain knowledge to identify which interactions are likely meaningful, or use feature importance scores from a tree model to identify which features are worth interacting.

Polynomial Features and Binning

For features with non-linear relationships to the target, polynomial features can help linear models: add x^2, x^3 alongside the original x.

Binning (discretizing) converts a continuous feature into buckets: age 0-17, 18-34, 35-54, 55+. This can help tree models by creating sharper split points and can encode domain knowledge (the relationship between age and insurance risk is not smooth -- it has distinct thresholds).

Feature Importance: Finding What Actually Matters

After fitting any tree-based model, you can extract feature importance scores that indicate which features contributed most to the model's predictions.

import lightgbm as lgb
model = lgb.LGBMClassifier().fit(X_train, y_train)
importance = pd.Series(model.feature_importances_, index=X_train.columns)
importance.sort_values(ascending=False).head(20).plot(kind='barh')

This serves two purposes: it tells you which features to invest more engineering effort on, and it tells you which features are near-useless (candidates for removal to reduce model complexity and training time).

Permutation importance (randomize one feature at a time and measure performance drop) is more reliable than the impurity-based importance that tree models compute by default. Use sklearn.inspection.permutation_importance for a more honest estimate.

Feature Leakage: The Silent Killer

Feature leakage is including information in your training data that would not be available at prediction time in production. It is the most dangerous mistake in feature engineering because it produces models that appear to work perfectly during development and fail completely in production.

Common leakage examples: using the transaction timestamp to predict fraud when that timestamp is only assigned after the transaction is processed; using a customer's total lifetime value to predict whether they will churn (you would not know this value for a customer who has not churned yet); using future data to create time-series features.

Rule: for every feature, ask "would I have this information at the moment I need to make this prediction in production?" If not, drop the feature.

The 80/20 Reality

Most practitioners find that simple, well-understood features -- properly cleaned, properly encoded, with obvious transformations applied -- account for 80% or more of achievable model performance. The last 20% comes from clever feature engineering, hyperparameter tuning, and ensemble methods.

Start with the basics: fix skewed distributions with log transforms, encode categoricals properly, handle missing values explicitly, add obvious derived features that encode domain knowledge. Only pursue exotic feature engineering after you have the basics right and established a baseline.

Keep Reading

Decision Trees and Random Forests Explained -- tree-based feature importance is the best tool for guiding feature engineering work
ML Model Evaluation Metrics Guide -- knowing which metric to optimize tells you how to prioritize feature engineering efforts
Machine Learning Complete Guide for Software Developers -- the full pipeline that feature engineering fits into

Pristren builds AI-powered software for teams. Zlyqor is our all-in-one workspace -- chat, projects, time tracking, AI meeting summaries, and invoicing -- in one tool. Try it free.

Feature Engineering: The Practical Guide to Transforming Raw Data into ML Inputs

Related Articles

How to Get Started with Computer Vision as a Developer?

ONNX: Export Any ML Model and Run It Anywhere

Why Raw Data Rarely Works Out of the Box

Log Transforms: Handling Skewed Distributions

One-Hot Encoding: Handling Categorical Variables

Cyclical Encoding: Handling Time-Based Features

Interaction Features: Encoding Multiplicative Relationships

Polynomial Features and Binning

Feature Importance: Finding What Actually Matters

Feature Leakage: The Silent Killer

The 80/20 Reality

Keep Reading

The workspace your team
actually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

Supervised Learning Explained: How Models Learn from Labeled Examples

Feature Engineering: The Practical Guide to Transforming Raw Data into ML Inputs

Related Articles

How to Get Started with Computer Vision as a Developer?

ONNX: Export Any ML Model and Run It Anywhere

Why Raw Data Rarely Works Out of the Box

Log Transforms: Handling Skewed Distributions

One-Hot Encoding: Handling Categorical Variables

Cyclical Encoding: Handling Time-Based Features

Interaction Features: Encoding Multiplicative Relationships

Polynomial Features and Binning

Feature Importance: Finding What Actually Matters

Feature Leakage: The Silent Killer

The 80/20 Reality

Keep Reading

The workspace your teamactually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

Supervised Learning Explained: How Models Learn from Labeled Examples

The workspace your team
actually needs