Prompt Chain

Evaluate AI Agent Performance

Name: Evaluate AI Agent Performance
Availability: OnlineOnly
Author: Promptfoo

Example workflow demonstrating how to evaluate CrewAI agent performance using promptfoo's testing framework for AI agents.

Copy chain

Works with crewaipromptfoo

Promptfoo

Maintainer?

Spark score

out of 100

Updated yesterday

Version code-scan-action-0.1

Add to Favorites

Why it matters

Leverage CrewAI agents and promptfoo to rigorously evaluate the performance and reliability of your AI agents. Ensure your agents meet desired quality standards before deployment.

Outcomes

What it gets done

Set up and run evaluations for CrewAI agent performance.

Integrate promptfoo for systematic AI agent testing.

Analyze agent outputs to identify areas for improvement.

Automate the process of AI agent quality assurance.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/pfoo-integration-crewai | bash

Capabilities

What this chain does

Audit access

Reviews permissions and logs to flag unauthorized activity.

Search the web

Searches the web and retrieves relevant sources.

Summarize

Condenses long documents or threads into key takeaways.

Overview

Integration Crewai

What it does

This is a working integration example that connects CrewAI agents to the promptfoo evaluation framework. It demonstrates the configuration and setup needed to test AI agent performance systematically, providing a template for evaluating multi-agent CrewAI systems.

How it connects

Use this example when you need to set up performance testing for CrewAI agents or want to understand how to integrate agent frameworks with evaluation tools. It's ideal for teams building multi-agent systems who need reproducible testing workflows.

Source README

integration-crewai (CrewAI Integration)

This example shows how to use CrewAI agents with promptfoo to evaluate AI agent performance.

What is CrewAI?

CrewAI is a framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.

Quick Start

You can run this example with:

npx promptfoo@latest init --example integration-crewai
cd integration-crewai

Prerequisites

This example requires the following:

Python 3.10+
Node.js 20+
OpenAI API Key - You MUST have a valid OpenAI API key to run this example

Environment Setup

You need to set the OpenAI API key. Choose one of these methods:

Option 1: Environment Variable (Recommended)

export OPENAI_API_KEY=your-api-key-here

Option 2: .env File

Create a .env file in this directory:

OPENAI_API_KEY=your-api-key-here

If using a .env file, uncomment python-dotenv in requirements.txt and reinstall dependencies.

Installation

Install Python packages:

pip install -r requirements.txt

Note: The openai package and other dependencies (langchain, pydantic, etc.) will be automatically installed as dependencies of crewai.

Install promptfoo CLI:

npm install -g promptfoo

Files

agent.py: Contains the CrewAI agent setup and promptfoo provider interface
promptfooconfig.yaml: Configures prompts, providers, and tests for evaluation

Note on Reliability

When using a real LLM, you may notice that the agent's output is not always reliable, especially for more complex queries. For example, the agent may fail to return valid JSON or may not return a response at all. This is a common challenge when working with LLMs.

Running the Evaluation

Run the evaluation:

promptfoo eval

Explore results in browser:

promptfoo view

Troubleshooting

If you see authentication errors:

Ensure your OpenAI API key is set correctly
Verify the key is valid and has sufficient quota
Check that the environment variable is accessible to the Python process

Discussion