Multi-step prompt sequences for complex AI workflows.
203 tools found
OpenAI Agents SDK cookbook showing how to build evidence-review agents with compaction for long conversations and memory for reusable workflow lessons.
A fast-executing test fixture demonstrating function calling for knowledge retrieval using two local paper records and legacy tool-calling patterns from the
Prompt chain workflow that benchmarks OpenAI GPT model performance across different reasoning effort levels to identify optimal cost-performance trade-offs.
Master GPT-5.4's multimodal features for document understanding by optimizing image detail, verbosity, reasoning, and tool usage for SOTA results.
A prompt workflow example demonstrating how to run and test multiple Amazon Bedrock AI models using promptfoo's evaluation framework.
A prompt workflow example that benchmarks Claude and GPT models side-by-side using promptfoo's evaluation framework to compare outputs on identical prompts.
A prompt workflow example that runs side-by-side performance comparisons between OpenAI's GPT-4o and GPT-4o-mini models to evaluate output quality and cost
Prompt workflow that benchmarks GPT-5 against GPT-5 Mini using MMLU (Massive Multitask Language Understanding) test suite to compare model performance across
Prompt workflow comparing OpenAI GPT-5.4, Anthropic Claude Sonnet 4.6, and Google Gemini 3.1 Pro Preview on riddle-solving with cost, latency, and quality
A prompt workflow example that compares responses from Llama and GPT language models side-by-side using promptfoo's evaluation framework.
A prompt workflow example that benchmarks and compares Mistral and Llama language models side-by-side using promptfoo's evaluation framework.
Prompt workflow that benchmarks DeepSeek, Mistral, Llama, and Qwen models on factual assertion tasks using OpenRouter to compare open-source LLM performance.