SQL for Data Scientists: Window Functions, CTEs, and Query Optimization

SQL is the most important data science tool that data scientists often undervalue. Window functions alone replace hundreds of lines of pandas code.

Mahmudul Haque Qudrati

CEO & ML Engineer

May 18, 2026

10 min read

// tags

#sql#window-functions#cte#data-science#query-optimization

FIG. ART-29

10 min read

“

SQL for Data Scientists: Window Functions, CTEs, and Query Optimization

// reading plan

sections

1,140

words

min read

// Developer Tools

PostgreSQL Guide for Developers: Why Postgres Wins in 2026

Postgres offers reliability, JSONB, full-text search, row-level security, and powerful extensions. Here is what every developer needs to know.

9 min read

// Developer Tools

Drizzle ORM Guide: TypeScript-First SQL That Stays Close to the Database

SQL is the most important data science tool you are probably underusing. Most data scientists learn pandas first and treat SQL as a secondary skill. This is backwards. The vast majority of data in real organizations lives in databases. SQL is the language those databases speak, and the compute power of a modern data warehouse is enormous. Learning to use SQL effectively means pushing expensive transformations down to the database instead of pulling gigabytes of raw data into Python.

This guide focuses on the SQL features that matter most for data science: window functions, CTEs, aggregations, and query optimization.

Window Functions: The Feature That Changes Everything

Window functions perform calculations across a set of rows that are related to the current row, without collapsing those rows into a single output row. They eliminate enormous amounts of pandas code.

ROW_NUMBER, RANK, DENSE_RANK

-- Rank employees by salary within each department
SELECT
    employee_id,
    department,
    salary,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num,
    RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rank,
    DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rank
FROM employees;

Difference: if two employees have the same salary, RANK skips the next number (1, 2, 2, 4), DENSE_RANK does not (1, 2, 2, 3), and ROW_NUMBER assigns arbitrary unique numbers.

Use ROW_NUMBER to get the top-N rows per group (a classic pattern):

-- Top 3 earners per department
SELECT * FROM (
    SELECT *,
        ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
    FROM employees
) ranked
WHERE rn <= 3;

LAG and LEAD: Comparing to Adjacent Rows

-- Month-over-month revenue change
SELECT
    month,
    revenue,
    LAG(revenue, 1) OVER (ORDER BY month) AS prev_month_revenue,
    revenue - LAG(revenue, 1) OVER (ORDER BY month) AS revenue_change,
    ROUND(100.0 * (revenue - LAG(revenue, 1) OVER (ORDER BY month))
          / LAG(revenue, 1) OVER (ORDER BY month), 2) AS pct_change
FROM monthly_revenue;

LAG(column, n) returns the value from n rows before. LEAD(column, n) returns the value from n rows after. Essential for time series analysis.

Running Totals and Moving Averages

-- Running total of revenue
SELECT
    date,
    daily_revenue,
    SUM(daily_revenue) OVER (ORDER BY date) AS cumulative_revenue,
    AVG(daily_revenue) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS rolling_7day_avg
FROM daily_sales;

The ROWS BETWEEN clause controls which rows are included in the window. ROWS BETWEEN 6 PRECEDING AND CURRENT ROW gives you the last 7 days (current + 6 previous).

CTEs: Writing Readable Complex Queries

Common Table Expressions let you name intermediate results and reference them like tables. They transform unreadable nested subqueries into step-by-step logic.

-- Without CTE: hard to read
SELECT department, avg_salary
FROM (
    SELECT department, AVG(salary) AS avg_salary
    FROM (
        SELECT * FROM employees WHERE hire_date > '2020-01-01'
    ) recent_hires
    GROUP BY department
) dept_averages
WHERE avg_salary > 80000;

-- With CTEs: readable and maintainable
WITH recent_hires AS (
    SELECT * FROM employees
    WHERE hire_date > '2020-01-01'
),
dept_averages AS (
    SELECT department, AVG(salary) AS avg_salary
    FROM recent_hires
    GROUP BY department
)
SELECT department, avg_salary
FROM dept_averages
WHERE avg_salary > 80000;

CTEs can reference previous CTEs. Build complex logic incrementally. In most databases, CTEs are equivalent in performance to subqueries (the optimizer handles them the same way).

Recursive CTEs are a separate feature for traversing hierarchical data (org charts, graph structures). They are worth knowing but beyond everyday use.

SQL for Data Scientists: Window Functions, CTEs, and Query Optimization

Related Articles

PostgreSQL Guide for Developers: Why Postgres Wins in 2026

Drizzle ORM Guide: TypeScript-First SQL That Stays Close to the Database

Window Functions: The Feature That Changes Everything

ROW_NUMBER, RANK, DENSE_RANK

LAG and LEAD: Comparing to Adjacent Rows

Running Totals and Moving Averages

CTEs: Writing Readable Complex Queries

Aggregations and GROUP BY

JOINs: When to Use Each Type

Query Optimization: The Practical Rules

SQL vs Pandas: The Decision Rule

Keep Reading

The workspace your team
actually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

How Product Teams Can Work Effectively With Machine Learning

SQL for Data Scientists: Window Functions, CTEs, and Query Optimization

Related Articles

PostgreSQL Guide for Developers: Why Postgres Wins in 2026

Drizzle ORM Guide: TypeScript-First SQL That Stays Close to the Database

Window Functions: The Feature That Changes Everything

ROW_NUMBER, RANK, DENSE_RANK

LAG and LEAD: Comparing to Adjacent Rows

Running Totals and Moving Averages

CTEs: Writing Readable Complex Queries

Aggregations and GROUP BY

JOINs: When to Use Each Type

Query Optimization: The Practical Rules

SQL vs Pandas: The Decision Rule

Keep Reading

The workspace your teamactually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

How Product Teams Can Work Effectively With Machine Learning

The workspace your team
actually needs