What Promptfoo Does
Promptfoo is a CLI tool for testing LLM prompts systematically. You write test cases in YAML, define assertions, and Promptfoo runs every prompt against every model in your config — giving you a comparison matrix and a pass/fail report. It also ships a red-teaming engine that automatically probes for safety vulnerabilities.
Installation
npm install -g promptfoo
# or
npx promptfoo@latest
Basic YAML Test Config
# promptfooconfig.yaml
providers:
- openai:gpt-4o-mini
- openai:gpt-4o
- ollama:llama3.1:8b
prompts:
- "Summarize the following text in one sentence: {{text}}"
tests:
- vars:
text: "PagedAttention is a memory management technique for LLM inference that treats the KV cache like virtual memory pages."
assert:
- type: contains
value: "KV cache"
- type: javascript
value: "output.length < 200"
- type: llm-rubric
value: "The summary is factually accurate and concise"
Run with:
promptfoo eval
promptfoo view # Open browser UI with comparison table
Model Comparison in PR Checks
Add --ci flag to output JSON results consumable by GitHub Actions:
promptfoo eval --ci --output results.json
Parse results.json in a workflow step to comment score diffs on the PR — your reviewers see exactly which model and which test case regressed.
Custom JavaScript Assertions
For complex validation logic, write a JS function:
// assertions/check-structured.js
module.exports = (output) => {
try {
const parsed = JSON.parse(output);
return {
pass: parsed.name && parsed.age > 0,
score: 1,
reason: "Valid structured output",
};
} catch {
return { pass: false, score: 0, reason: "Not valid JSON" };
}
};
Reference in YAML:
assert:
- type: javascript
value: file://assertions/check-structured.js
Red-Teaming
promptfoo redteam init # Generates redteam config from your system prompt
promptfoo redteam run # Runs attack probes
promptfoo redteam report # View results
Red-team plugins include:
- prompt-injection: attempts to override system prompt
- jailbreak: social engineering and role-play attacks
- pii: probes for PII leakage in RAG responses
- sql-injection: for LLMs with DB tool access
- excessive-agency: checks if the model takes unauthorised actions
GitHub Actions Integration
name: Prompt Quality Check
on: [pull_request]
jobs:
promptfoo:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: npx promptfoo@latest eval --ci
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
Full documentation at promptfoo.dev.