Monitoring Your Application with Prometheus and Grafana
Prometheus scrapes metrics from your app, Grafana visualizes them. Here is how to instrument a Node.js app, build dashboards, and set up alerts that matter.
Prometheus collects metrics from your application by scraping a /metrics HTTP endpoint. Grafana visualizes those metrics in dashboards and fires alerts when thresholds are crossed. Together they are the most common open-source monitoring stack for production applications, and understanding them will make you a better operator of any backend system.
What You Are Trying to Monitor
Before setting up tools, know what you are measuring. The RED method defines the three signals that matter most for every service:
Rate: how many requests per second is your service handling? A sudden drop is as alarming as a sudden spike.
Errors: what percentage of requests are failing? Track 4xx and 5xx separately: 4xx are usually client errors, 5xx are your bugs.
Duration: how long are requests taking? Track percentiles, not averages. P50 (median), P95, and P99 tell you what most users experience and what the worst-case experience is.
For infrastructure-level monitoring (not covered by RED), track CPU utilization, memory usage, disk I/O, and network throughput.
What Prometheus Is
Prometheus is a time-series database and metric collection system. It works on a pull model: you configure Prometheus with a list of targets (your app instances' /metrics endpoints), and Prometheus scrapes those endpoints on a regular interval (typically every 15-30 seconds) and stores the metrics.
This pull model has an important implication: your application does not need to know about your monitoring system. You expose a /metrics endpoint, and Prometheus finds it. Adding a new metric to your app does not require any coordination with the Prometheus server.
Prometheus stores data in its own time-series database on disk. It is designed for high-cardinality time-series data (many unique combinations of metric labels) and is optimized for fast aggregation queries over time ranges.
Team workspace
Ship faster with chat, meetings, and projects in one place — Zlyqor.
Next.js does not have a traditional Express middleware layer, but you can add Prometheus metrics to Next.js API routes using the same prom-client library. Create a /api/metrics route that returns the Prometheus exposition format, and add timing logic to individual routes or to a shared wrapper function.
For App Router, the instrumentation.ts file (Next.js's built-in instrumentation hook) is the right place to initialize Prometheus collectors.
What Grafana Is
Grafana is a visualization platform that connects to data sources (Prometheus, Loki for logs, Tempo for traces, and many others) and lets you build dashboards. Dashboards consist of panels: graphs, stat displays, gauges, tables, and more.
Grafana's query language for Prometheus is PromQL. Example: the request rate over the last 5 minutes:
Grafana's dashboard builder is visual: you write PromQL queries in the panel editor and see the graph render live. Dashboards can be exported as JSON and committed to version control.
Alerting
Grafana Alerting lets you define rules that fire when a metric crosses a threshold. The rule evaluates a PromQL query on a schedule and sends notifications via Slack, email, PagerDuty, or webhooks.
Principles for good alerting:
Alert on symptoms, not causes. Alert when error rate is high (symptom), not when CPU is high (cause). High CPU does not always mean users are affected. High error rate always means users are affected.
Set meaningful thresholds. "Error rate > 1% for 5 minutes" is a meaningful alert. "Any error ever" is noise. "CPU > 80%" by itself is noise.
Alert on what you would wake up for. If an alert fires and you look at it and decide nothing needs to be done, the alert should not exist. Alert fatigue kills monitoring programs.
Hosted Monitoring Alternatives
Grafana Cloud: hosted Prometheus + Grafana, free tier (10,000 metric series, 50GB logs, 14 days retention). The easiest way to run this stack without self-hosting.
Datadog: the most comprehensive commercial monitoring platform. APM, metrics, logs, traces, synthetics, security - all in one. Expensive ($15+/host/month) but the best-in-class experience for organizations that can afford it.
New Relic: similar to Datadog, competitive on pricing for certain tiers.
Better Uptime / UptimeRobot: simpler uptime monitoring (HTTP ping checks, status pages). Not a replacement for Prometheus but solves the "is my site up?" problem for $0.
When Self-Hosted Monitoring Makes Sense
Self-hosted Prometheus + Grafana on the same VPS as your application costs nothing extra and gives you full control over retention and data privacy. For small teams on a budget, this is the pragmatic choice.
Managed monitoring (Grafana Cloud, Datadog) makes sense when: your team does not want to manage infrastructure, you need long-term metric retention, or the time saved on operations is worth the monthly cost.
Pristren builds AI-powered software for teams. Zlyqor is our all-in-one workspace - chat, projects, time tracking, AI meeting summaries, and invoicing - in one tool. Try it free.
Practical deep-dives on LLMs, developer tools, and AI engineering. No filler. Unsubscribe any time.
// written byFIG. AUTH-01
530
Mahmudul Haque Qudrati
CEO & ML Engineer
CEO and ML Engineer at Pristren. Builds AI-powered software for teams and writes about machine learning, LLMs, developer tools, and practical AI applications.
SpaceX is buying Cursor, the AI-powered code editor. The deal signals a shift in how AI coding tools are valued and deployed. Here's a practical breakdown of what's happening and what it means for developers.