Data visualization in Python has a fragmented tool landscape, and choosing the wrong library adds friction to your work. The good news: each major library occupies a distinct niche. Once you understand where each one fits, the choice becomes straightforward.
The Tool Hierarchy
Matplotlib is the foundation. Nearly every other Python visualization library is built on top of it. It gives you complete control over every element of a plot, which makes it powerful and verbose. Use matplotlib when you need a custom chart that no higher-level API supports, when you are building a publication-quality figure with precise layout requirements, or when you need to embed plots in a GUI application.
Seaborn is statistical visualization built on matplotlib. It produces beautiful statistical plots with significantly less code. Use seaborn for exploratory data analysis, statistical relationships, and distribution plots. It integrates directly with pandas DataFrames.
Plotly produces interactive charts that run in a browser. Hover tooltips, zoom, pan, and filter are built in. Use plotly when your audience is non-technical (they can explore the data themselves), when you are building a dashboard or web application, or when interactivity adds genuine value. Plotly Express (px) is the high-level API that makes most charts one-liners.
Altair takes a declarative approach based on the Vega-Lite grammar of graphics. You describe what you want (encode data columns to visual channels like x, y, color, size) and Altair figures out how to render it. Use altair when you think in terms of data encodings, when you want to layer and compose charts, or when you are working in a Jupyter environment and want interactive charts without the Plotly overhead.
Bar Charts: Categorical Comparison
Bar charts compare discrete categories. Use them when you have a categorical variable and want to compare values across categories.
import seaborn as sns
import matplotlib.pyplot as plt
# Seaborn: sales by region
sns.barplot(data=df, x="region", y="sales", estimator="sum")
plt.title("Total Sales by Region")
plt.show()
# Plotly Express: same chart, interactive
import plotly.express as px
fig = px.bar(df, x="region", y="sales", title="Total Sales by Region")
fig.show()
Common mistakes with bar charts: using a bar chart for a continuous variable (use a histogram instead), starting the y-axis at a non-zero value to exaggerate differences, and using 3D bars (they distort perception).
Line Charts: Time Series
Line charts show how a value changes over time. They require a meaningful ordering on the x-axis.
# Matplotlib: revenue over time
plt.figure(figsize=(12, 5))
plt.plot(df["date"], df["revenue"], linewidth=2)
plt.xlabel("Date")
plt.ylabel("Revenue ($)")
plt.title("Monthly Revenue")
plt.grid(True, alpha=0.3)
plt.show()
# Plotly: multiple lines with interactivity
fig = px.line(df, x="date", y="revenue", color="product",
title="Revenue by Product Over Time")
fig.show()
Use multiple lines to compare trends across groups. Avoid more than 5-6 lines on a single chart before it becomes unreadable. Add confidence intervals or bands for forecasts.
Scatter Plots: Relationships Between Variables
Scatter plots reveal relationships (correlations) between two continuous variables.
# Seaborn with regression line
sns.regplot(data=df, x="marketing_spend", y="revenue")
plt.title("Marketing Spend vs Revenue")
plt.show()
# Seaborn with color encoding for a third variable
sns.scatterplot(data=df, x="experience", y="salary",
hue="department", size="team_size")
plt.show()
Color, size, and shape can encode additional dimensions, but do not overload a scatter plot with too many encodings. Two variables (x, y) plus one encoding (color) is usually the maximum before the chart becomes unreadable.
Histograms: Distributions
Histograms show the distribution of a single continuous variable.
# Seaborn with kernel density estimate
sns.histplot(df["age"], kde=True, bins=30)
plt.title("Age Distribution")
plt.show()
# Multiple distributions overlaid
sns.histplot(data=df, x="salary", hue="department", kde=True)
plt.show()
Choose bin count carefully. Too few bins hide structure; too many bins create noise. The default 10 bins in matplotlib is almost always wrong. Start with 30 and adjust.
Heatmaps: Correlation Matrices and Grid Data
Heatmaps encode a third variable (value) using color across a two-dimensional grid.
# Correlation matrix heatmap
import numpy as np
corr = df.select_dtypes(include="number").corr()
mask = np.triu(np.ones_like(corr, dtype=bool)) # Hide upper triangle
plt.figure(figsize=(10, 8))
sns.heatmap(corr, mask=mask, annot=True, fmt=".2f",
cmap="coolwarm", center=0, vmin=-1, vmax=1)
plt.title("Feature Correlation Matrix")
plt.show()
Use diverging color maps (like coolwarm) when values range from negative to positive. Use sequential color maps (like Blues) when values are all positive.
Box Plots: Distribution and Outliers
Box plots show the median, interquartile range, and outliers in a single chart. They are better than bar charts when comparing distributions (not just means) across categories.
# Box plot: salary distribution by department
sns.boxplot(data=df, x="department", y="salary")
plt.xticks(rotation=45)
plt.title("Salary Distribution by Department")
plt.show()
# Violin plot: box plot + kernel density
sns.violinplot(data=df, x="department", y="salary")
plt.show()
Violin plots combine a box plot with a kernel density estimate, showing the full distribution shape. They are more informative than box plots but take more space and are less familiar to non-technical audiences.
Visualization Principles
Choose the right chart type. The single most important decision. A bar chart for a continuous variable, a pie chart for more than 5 categories, or a line chart for unordered categories are all category errors that mislead the reader.
Avoid chartjunk. Remove gridlines, background colors, borders, and 3D effects that add visual noise without adding information. Maximizing the data-to-ink ratio is the goal.
Label axes and titles clearly. Every chart needs a title describing what it shows, axis labels with units, and a legend if multiple groups are encoded. "Figure 1" is not a title.
Use color purposefully. Color should encode data, not decorate. Use colorblind-friendly palettes (seaborn's colorblind palette, or viridis/plasma for sequential data).
Context matters. A chart for EDA (you exploring your data) can be rough and quick. A chart for a stakeholder presentation needs polished labels, a clear title, and a takeaway message.
Keep Reading
- Exploratory Data Analysis Guide — the process that drives visualization
- Machine Learning Complete Guide for Software Developers — where visualization fits in the ML workflow
- Python Data Science Tools in 2026 — the full stack context
Pristren builds AI-powered software for teams. Zlyqor is our all-in-one workspace — chat, projects, time tracking, AI meeting summaries, and invoicing — in one tool. Try it free.