Promptfoo

tool active open-source

A test framework for evaluating and red-teaming LLM applications. Promptfoo enables systematic testing of prompts across different models, comparing outputs side-by-side, and catching regressions. It supports custom assertions, automated grading, and CI/CD integration for prompt testing pipelines.

Implements

Concepts this tool claims to implement:

Benchmark primary

Define test cases with inputs and expected outputs. Compare prompts across multiple providers (OpenAI, Anthropic, local models). Assertion types include contains, regex, LLM-graded, and custom.
Red Teaming primary

Built-in red team plugin for automated adversarial testing. Tests for prompt injection, jailbreaks, PII leakage, and harmful outputs. Generates attack prompts and evaluates model robustness.
Guardrails secondary

Test guardrail effectiveness by running adversarial inputs. Verify safety behaviors hold across edge cases.

Integration Surfaces

Details

Vendor: Promptfoo (open source)
License: MIT
Runs On: local, cloud
Used By: human, system

Notes

Promptfoo is excellent for prompt engineering workflows and regression testing. The red-teaming capabilities were added later but are now a major feature. Lightweight compared to full observability platforms.

Implements

Integration Surfaces

Details

Links

Notes

Related Tools