Promptfoo
A test framework for evaluating and red-teaming LLM applications. Promptfoo enables systematic testing of prompts across different models, comparing outputs side-by-side, and catching regressions. It supports custom assertions, automated grading, and CI/CD integration for prompt testing pipelines.
Implements
Concepts this tool claims to implement:
- Benchmark primary
Define test cases with inputs and expected outputs. Compare prompts across multiple providers (OpenAI, Anthropic, local models). Assertion types include contains, regex, LLM-graded, and custom.
- Red Teaming primary
Built-in red team plugin for automated adversarial testing. Tests for prompt injection, jailbreaks, PII leakage, and harmful outputs. Generates attack prompts and evaluates model robustness.
- Guardrails secondary
Test guardrail effectiveness by running adversarial inputs. Verify safety behaviors hold across edge cases.
Integration Surfaces
Details
- Vendor
- Promptfoo (open source)
- License
- MIT
- Runs On
- local, cloud
- Used By
- human, system
Links
Notes
Promptfoo is excellent for prompt engineering workflows and regression testing. The red-teaming capabilities were added later but are now a major feature. Lightweight compared to full observability platforms.