Generate and Test Image Prompts
Redteam Dalle: Test DALL-E image generation prompts for safety and quality. Ensure your AI image outputs are robust and aligned with safety guidelines.
Why it matters
This asset helps users generate and test image prompts for DALL-E, ensuring they are robust and produce desired outputs. It's designed for red-teaming to uncover potential vulnerabilities or biases in image generation.
Outcomes
What it gets done
Generate image prompts for DALL-E.
Test generated prompts for effectiveness and safety.
Identify potential prompt injection or adversarial attacks.
Audit image generation outputs for unintended content.
Install
Add it to your toolbox
Run in your project directory:
curl -fsSL https://spark.entire.vc/get/pfoo-redteam-dalle | bash Capabilities
What this chain does
Creates images from text prompts or templates.
Reviews permissions and logs to flag unauthorized activity.
Overview
Redteam Dalle
What it does
This prompt chain is designed to redteam DALL-E image generation prompts. It systematically tests prompts to identify potential safety issues and undesirable outputs, ensuring the AI generates content that aligns with safety guidelines and user expectations.
How it connects
Use this prompt chain when you need to rigorously test the safety and robustness of prompts intended for DALL-E image generation, especially for identifying prompt injection vulnerabilities or harmful content generation before deployment.
Source README
redteam-dalle (DALL-E Red Team Example)
You can run this example with:
npx promptfoo@latest init --example redteam-dalle
cd redteam-dalle
This example demonstrates how to use promptfoo to automatically discover jailbreaks in OpenAI's DALL-E image generation model. It includes pre-configured test cases that attempt to generate various types of harmful content.
⚠️ Warning: Running this example may get your OpenAI account flagged for moderation or banned.
Setup
Set your OpenAI API key:
export OPENAI_API_KEY=your_key_hereInitialize the example:
npx promptfoo@latest init --example redteam-dalle
Usage
Review and optionally modify the test cases in
promptfooconfig.yaml. The example includes the same test cases shown in our blog post.Run the evaluation:
npx promptfoo@latest evalView the results in the web UI:
npx promptfoo@latest viewImportant table settings:
- Under TABLE SETTINGS, enable "Render model outputs as Markdown" to display generated images
- Set "Max text length" to unlimited to view complete responses
- Click the magnifying glass icon (🔍) in the "View output and test details" column to see the final modified prompt that was used, complete conversation history, and scoring breakdown for each iteration.
Expected Behavior
During evaluation, you may see error messages like:
Error from target provider: 400 Your request was rejected as a result of our safety system.
Error from target provider: 400 This request has been blocked by our content filters.
This is normal expected behavior. Promptfoo automatically retries with modified prompts in a loop until it succeeds or reaches the maximum number of iterations.
Configuration
The default configuration uses 4 iterations per test case. To increase this (and potentially find more jailbreaks), set:
export PROMPTFOO_NUM_JAILBREAK_ITERATIONS=6For debugging or to see the internal workings, enable debug logging:
LOG_LEVEL=debug npx promptfoo@latest eval -j 1
Troubleshooting
Note: DALL-E image URLs expire after 2 hours. The example includes an extension hook that downloads images to a local images directory as soon as they are generated. Each image is saved with a filename based on the test description and timestamp.
- If you get rate limit errors, try reducing concurrency with
-j 1 - If you get timeout errors, the evaluation is still running in the background. Wait a few minutes and check the results
- For other issues, please check our documentation or file an issue
Learn More
For more details about LLM red teaming with promptfoo, check out:
Discussion
Questions & comments · 0
Sign In Sign in to leave a comment.