Promptfoo

tool active open-source

A test framework for evaluating and red-teaming LLM applications. Promptfoo enables systematic testing of prompts across different models, comparing outputs side-by-side, and catching regressions. It supports custom assertions, automated grading, and CI/CD integration for prompt testing pipelines.

Implements

Concepts this tool claims to implement:

  • Benchmark primary

    Define test cases with inputs and expected outputs. Compare prompts across multiple providers (OpenAI, Anthropic, local models). Assertion types include contains, regex, LLM-graded, and custom.

  • Built-in red team plugin for automated adversarial testing. Tests for prompt injection, jailbreaks, PII leakage, and harmful outputs. Generates attack prompts and evaluates model robustness.

  • Guardrails secondary

    Test guardrail effectiveness by running adversarial inputs. Verify safety behaviors hold across edge cases.

Integration Surfaces

  • CLI
  • Node.js SDK
  • Python wrapper
  • YAML configuration
  • CI/CD (GitHub Actions, etc.)
  • Web UI for results

Details

Vendor
Promptfoo (open source)
License
MIT
Runs On
local, cloud
Used By
human, system

Notes

Promptfoo is excellent for prompt engineering workflows and regression testing. The red-teaming capabilities were added later but are now a major feature. Lightweight compared to full observability platforms.