Multi-step prompt sequences for complex AI workflows.
106 tools found
Multi-step prompt workflow that uses GPT-5.5 to generate structured office layouts from empty floorplans, furniture catalogs, and spatial constraints.
A prompt workflow that benchmarks and compares different GPT model tiers side-by-side to evaluate performance, cost, and output quality across OpenAI's model
Prompt chain workflow that benchmarks OpenAI GPT model performance across different reasoning effort levels to identify optimal cost-performance trade-offs.
Runs OSWorld multimodal computer-use benchmark tasks through promptfoo, testing agents that observe Ubuntu desktop screenshots and act with mouse and keyboard
Test suite that evaluates the PolicyPlugin test generator to ensure it produces valid policy violation tests for AI red-teaming workflows.
A prompt workflow example demonstrating how to run and test multiple Amazon Bedrock AI models using promptfoo's evaluation framework.
A prompt workflow example that runs side-by-side performance comparisons between OpenAI's GPT-4o and GPT-4o-mini models to evaluate output quality and cost
A prompt workflow example that compares GPT model outputs across different temperature settings to evaluate response variability and consistency.
Prompt workflow comparing OpenAI GPT-5.4, Anthropic Claude Sonnet 4.6, and Google Gemini 3.1 Pro Preview on riddle-solving with cost, latency, and quality
A prompt workflow example that benchmarks and compares Mistral and Llama language models side-by-side using promptfoo's evaluation framework.
Prompt workflow that benchmarks DeepSeek, Mistral, Llama, and Qwen models on factual assertion tasks using OpenRouter to compare open-source LLM performance.
JavaScript-based test case configuration example for promptfoo, demonstrating how to define and run prompt evaluation tests programmatically using code instead