Test Policy Plugin Code Generation
Test suite that evaluates the PolicyPlugin test generator to ensure it produces valid policy violation tests for AI red-teaming workflows.
Why it matters
This asset evaluates the Policy Plugin's test generation capabilities. It ensures the plugin's code generation for tests is robust and accurate.
Outcomes
What it gets done
Evaluate the Policy Plugin's test generation.
Verify the accuracy of generated test code.
Debug potential issues in the test generation process.
Install
Add it to your toolbox
Run in your project directory:
curl -fsSL https://spark.entire.vc/get/pfoo-evals | bash Capabilities
What this chain does
Creates unit, integration, or end-to-end test cases.
Analyzes code for bugs, style issues, and improvements.
Traces errors to their root cause and suggests fixes.
Overview
Evals
What it does
This is an evaluation suite specifically designed to test the PolicyPlugin test generator itself. It validates that the PolicyPlugin produces correct and valid policy violation tests, ensuring the test generation infrastructure works as expected.
How it connects
Use this suite when you need to validate the PolicyPlugin test generator before deploying it in production red-teaming workflows, or when troubleshooting issues with policy test generation quality.
Source README
policy evals
This suite evaluates the PolicyPlugin test generator itself.
It compares five native promptfoo redteam generate cases:
- normal single-input generation
- policy text with explicit test-generation instructions
- modifier-driven generation (Spanish output)
- multi-input generation with coordinated
document/queryattacks - log-analysis generation with
PromptBlock:output
The eval flow is:
- run
promptfoo redteam generateagainst each case config undercases/ - normalize the generated YAML into a stable JSON payload
- feed that payload into Promptfoo assertions and
llm-rubricchecks through an executable prompt
That keeps the suite on Promptfoo's real CLI generation path instead of using a custom harness provider.
Prerequisites
OPENAI_API_KEYavailable in your environment or in.env
Run
From the repository root:
npm run local -- validate -c src/redteam/plugins/policy/evals/promptfooconfig.yaml
npm run local -- eval -c src/redteam/plugins/policy/evals/promptfooconfig.yaml --env-file .env --no-cache
To generate any single comparison case directly:
npm run local -- redteam generate -c src/redteam/plugins/policy/evals/cases/normal-single-input.yaml -o /tmp/policy-normal.yaml --force
Files
promptfooconfig.yaml- eval suitegeneratePolicyEvalPrompt.cjs- executable prompt that runsredteam generatefor one case and emits normalized JSONcases/*.yaml- native redteam generation configs being comparedtests/policy-generation.yaml- case metadata and Promptfoo assertions
Discussion
Questions & comments · 0
Sign In Sign in to leave a comment.