Multi-step prompt sequences for complex AI workflows.
22 tools found
Execute hard coding tasks with Claude Fable 5's adaptive thinking at maximum effort level for complex problem-solving.
Multi-step prompt workflow that builds a continuous improvement loop for OpenAI Agents SDK agents using traces, human and LLM feedback, Promptfoo evals, and
Multi-step prompt workflow that migrates legacy code by running an OpenAI agent outside sandboxed execution environments, validating each repo shard with tests
Multi-step workflow for discovering recurring behavior patterns across thousands of agentic system traces using lower-level eval labels and population-level
A prompt workflow that runs Promptfoo evaluations against Anthropic Messages API using an existing local Claude Code session instead of creating a separate API
Enables LLMs to interact with external tools via the Model Context Protocol, executing tool calls and integrating results back into conversations.
Prompt workflow example demonstrating MLflow AI Gateway integration as an LLM provider in promptfoo for governed model access and testing.
A prompt workflow example that benchmarks Claude and GPT models side-by-side using promptfoo's evaluation framework to compare outputs on identical prompts.
Prompt workflow comparing OpenAI GPT-5.4, Anthropic Claude Sonnet 4.6, and Google Gemini 3.1 Pro Preview on riddle-solving with cost, latency, and quality
A prompt workflow that selects the highest-scoring output from multiple evaluation runs, enabling automated quality-based selection of LLM responses.
A prompt chain example that demonstrates building evaluation rubrics for search quality assessment, using promptfoo's multi-step workflow capabilities.
Evaluation framework for testing LLM function and tool calling capabilities using promptfoo, enabling systematic assessment of model tool-use performance.