dbt: SQL-Based Data Transformation That Brings Software Engineering to Analytics
dbt (data build tool) lets analysts transform data in the warehouse using SELECT statements, with built-in testing, documentation, and dependency tracking - no more unmaintainable SQL scripts.
Before dbt, data transformation meant a patchwork of SQL scripts, Python ETL jobs, stored procedures, and manually maintained documentation. No tests, no dependency graph, no version control workflow. dbt fixes this by treating SQL transformations like software.
The philosophy: transform data in the warehouse, not before it. Load raw data first (using tools like Fivetran, Airbyte, or custom scripts), then transform it with dbt inside the warehouse.
Models: SQL SELECT Statements as First-Class Objects
Every dbt model is a .sql file containing a single SELECT statement. dbt handles creating the table or view.
-- models/staging/stg_orders.sql
SELECT
id AS order_id,
user_id,
created_at,
status,
CAST(total_cents AS FLOAT) / 100 AS total_usd
FROM {{ source('raw', 'orders') }}
WHERE created_at >= '2024-01-01'
-- models/marts/revenue_by_region.sql
WITH orders AS (
SELECT * FROM {{ ref('stg_orders') }} -- dependency tracked automatically
),
users AS (
SELECT * FROM {{ ref('stg_users') }}
)
SELECT
u.region,
DATE_TRUNC('month', o.created_at) AS month,
SUM(o.total_usd) AS revenue
FROM orders o
JOIN users u ON o.user_id = u.id
GROUP BY 1, 2
The ref() function is dbt's dependency system. dbt builds a DAG from all ref() calls and runs models in the right order.
Team workspace
Ship faster with chat, meetings, and projects in one place — Zlyqor.
dbt test # runs all tests, fails loudly if data quality issues found
Running dbt
dbt run # build all models
dbt run --select stg_orders # build one model
dbt run --select tag:staging # build all staging models
dbt build # run + test in one command
dbt docs generate && dbt docs serve # generate and serve documentation
Snapshots track how rows change over time - essential for SCD type 2 dimensions.
dbt Cloud vs dbt Core
dbt Core (open source): CLI tool, runs anywhere, connects to your warehouse. Free forever.
dbt Cloud: Managed platform with a web IDE, scheduled runs, CI/CD, and a semantic layer. Free tier exists; paid plans for teams.
dbt vs Spark/Airflow
dbt is complementary, not a replacement. Airflow orchestrates jobs (including running dbt). Spark processes data at scale before it reaches the warehouse. dbt transforms data inside the warehouse with SQL. Most modern stacks use all three.
Practical deep-dives on LLMs, developer tools, and AI engineering. No filler. Unsubscribe any time.
// written byFIG. AUTH-01
530
Mahmudul Haque Qudrati
CEO & ML Engineer
CEO and ML Engineer at Pristren. Builds AI-powered software for teams and writes about machine learning, LLMs, developer tools, and practical AI applications.