Prompt Chain

Test Policy Plugin Code Generation

Test suite that evaluates the PolicyPlugin test generator to ensure it produces valid policy violation tests for AI red-teaming workflows.

Works with github

54
Spark score
out of 100
Updated 2 days ago
Version code-scan-action-0.1
Models

Add to Favorites

Why it matters

This asset evaluates the Policy Plugin's test generation capabilities. It ensures the plugin's code generation for tests is robust and accurate.

Outcomes

What it gets done

01

Evaluate the Policy Plugin's test generation.

02

Verify the accuracy of generated test code.

03

Debug potential issues in the test generation process.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/pfoo-evals | bash

Capabilities

What this chain does

Write tests

Creates unit, integration, or end-to-end test cases.

Review code

Analyzes code for bugs, style issues, and improvements.

Debug

Traces errors to their root cause and suggests fixes.

Overview

Evals

What it does

This is an evaluation suite specifically designed to test the PolicyPlugin test generator itself. It validates that the PolicyPlugin produces correct and valid policy violation tests, ensuring the test generation infrastructure works as expected.

How it connects

Use this suite when you need to validate the PolicyPlugin test generator before deploying it in production red-teaming workflows, or when troubleshooting issues with policy test generation quality.

Source README

policy evals

This suite evaluates the PolicyPlugin test generator itself.

It compares five native promptfoo redteam generate cases:

  • normal single-input generation
  • policy text with explicit test-generation instructions
  • modifier-driven generation (Spanish output)
  • multi-input generation with coordinated document / query attacks
  • log-analysis generation with PromptBlock: output

The eval flow is:

  • run promptfoo redteam generate against each case config under cases/
  • normalize the generated YAML into a stable JSON payload
  • feed that payload into Promptfoo assertions and llm-rubric checks through an executable prompt

That keeps the suite on Promptfoo's real CLI generation path instead of using a custom harness provider.

Prerequisites

  • OPENAI_API_KEY available in your environment or in .env

Run

From the repository root:

npm run local -- validate -c src/redteam/plugins/policy/evals/promptfooconfig.yaml
npm run local -- eval -c src/redteam/plugins/policy/evals/promptfooconfig.yaml --env-file .env --no-cache

To generate any single comparison case directly:

npm run local -- redteam generate -c src/redteam/plugins/policy/evals/cases/normal-single-input.yaml -o /tmp/policy-normal.yaml --force

Files

  • promptfooconfig.yaml - eval suite
  • generatePolicyEvalPrompt.cjs - executable prompt that runs redteam generate for one case and emits normalized JSON
  • cases/*.yaml - native redteam generation configs being compared
  • tests/policy-generation.yaml - case metadata and Promptfoo assertions

Discussion

Questions & comments · 0

Sign In Sign in to leave a comment.