Promptfoo: Test and Red-Team Your LLM Prompts Before Shipping

Promptfoo runs your prompts against multiple models, checks outputs with assertion functions, and red-teams for jailbreaks and PII leakage - all from a YAML config.

Mahmudul Haque Qudrati

CEO & ML Engineer

April 8, 2026

6 min read

// tags

#promptfoo#red-teaming#prompt-testing#safety#ci/cd

FIG. ART-25

6 min read

“

Promptfoo: Test and Red-Team Your LLM Prompts Before Shipping

// reading plan

sections

357

words

min read

// Prompt Engineering

Prompt Versioning and Evaluation in CI/CD Pipelines: A Practical Guide

Treating prompts as code: how to track prompt changes, version them in git, and run automated regression tests on code changes.

10 min read

// AI Evaluation

SWE-Bench: The Gold Standard for Evaluating LLM Software Engineering

Basic YAML Test Config

# promptfooconfig.yaml
providers:
  - openai:gpt-4o-mini
  - openai:gpt-4o
  - ollama:llama3.1:8b

prompts:
  - "Summarize the following text in one sentence: {{text}}"

tests:
  - vars:
      text: "PagedAttention is a memory management technique for LLM inference that treats the KV cache like virtual memory pages."
    assert:
      - type: contains
        value: "KV cache"
      - type: javascript
        value: "output.length < 200"
      - type: llm-rubric
        value: "The summary is factually accurate and concise"

Run with:

promptfoo eval
promptfoo view  # Open browser UI with comparison table

Model Comparison in PR Checks

Add --ci flag to output JSON results consumable by GitHub Actions:

promptfoo eval --ci --output results.json

Parse results.json in a workflow step to comment score diffs on the PR - your reviewers see exactly which model and which test case regressed.

Custom JavaScript Assertions

For complex validation logic, write a JS function:

// assertions/check-structured.js
module.exports = (output) => {
  try {
    const parsed = JSON.parse(output);
    return {
      pass: parsed.name && parsed.age > 0,
      score: 1,
      reason: "Valid structured output",
    };
  } catch {
    return { pass: false, score: 0, reason: "Not valid JSON" };
  }
};

Reference in YAML:

assert:
  - type: javascript
    value: file://assertions/check-structured.js

Red-Teaming

promptfoo redteam init   # Generates redteam config from your system prompt
promptfoo redteam run    # Runs attack probes
promptfoo redteam report # View results

Red-team plugins include:

prompt-injection: attempts to override system prompt
jailbreak: social engineering and role-play attacks
pii: probes for PII leakage in RAG responses
sql-injection: for LLMs with DB tool access
excessive-agency: checks if the model takes unauthorised actions

GitHub Actions Integration

name: Prompt Quality Check
on: [pull_request]
jobs:
  promptfoo:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npx promptfoo@latest eval --ci
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

Full documentation at promptfoo.dev.

Promptfoo: Test and Red-Team Your LLM Prompts Before Shipping

Related Articles

Prompt Versioning and Evaluation in CI/CD Pipelines: A Practical Guide

What Promptfoo Does

Installation

Basic YAML Test Config

Model Comparison in PR Checks

Custom JavaScript Assertions

Red-Teaming

GitHub Actions Integration

The workspace your team
actually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

SWE-Bench: The Gold Standard for Evaluating LLM Software Engineering

LLM Safety and Alignment Explained for Developers

Promptfoo: Test and Red-Team Your LLM Prompts Before Shipping

Related Articles

Prompt Versioning and Evaluation in CI/CD Pipelines: A Practical Guide

What Promptfoo Does

Installation

Basic YAML Test Config

Model Comparison in PR Checks

Custom JavaScript Assertions

Red-Teaming

GitHub Actions Integration

The workspace your teamactually needs

AI & ML insights, weekly

Mahmudul Haque Qudrati

SWE-Bench: The Gold Standard for Evaluating LLM Software Engineering

LLM Safety and Alignment Explained for Developers

The workspace your team
actually needs