Evaluate AI Agent Performance
Example workflow demonstrating how to evaluate CrewAI agent performance using promptfoo's testing framework for AI agents.
Why it matters
Leverage CrewAI agents and promptfoo to rigorously evaluate the performance and reliability of your AI agents. Ensure your agents meet desired quality standards before deployment.
Outcomes
What it gets done
Set up and run evaluations for CrewAI agent performance.
Integrate promptfoo for systematic AI agent testing.
Analyze agent outputs to identify areas for improvement.
Automate the process of AI agent quality assurance.
Install
Add it to your toolbox
Run in your project directory:
curl -fsSL https://spark.entire.vc/get/pfoo-integration-crewai | bash Capabilities
What this chain does
Reviews permissions and logs to flag unauthorized activity.
Searches the web and retrieves relevant sources.
Condenses long documents or threads into key takeaways.
Overview
Integration Crewai
What it does
This is a working integration example that connects CrewAI agents to the promptfoo evaluation framework. It demonstrates the configuration and setup needed to test AI agent performance systematically, providing a template for evaluating multi-agent CrewAI systems.
How it connects
Use this example when you need to set up performance testing for CrewAI agents or want to understand how to integrate agent frameworks with evaluation tools. It's ideal for teams building multi-agent systems who need reproducible testing workflows.
Source README
integration-crewai (CrewAI Integration)
This example shows how to use CrewAI agents with promptfoo to evaluate AI agent performance.
What is CrewAI?
CrewAI is a framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.
Quick Start
You can run this example with:
npx promptfoo@latest init --example integration-crewai
cd integration-crewai
Prerequisites
This example requires the following:
- Python 3.10+
- Node.js 20+
- OpenAI API Key - You MUST have a valid OpenAI API key to run this example
Environment Setup
You need to set the OpenAI API key. Choose one of these methods:
Option 1: Environment Variable (Recommended)
export OPENAI_API_KEY=your-api-key-here
Option 2: .env File
Create a .env file in this directory:
OPENAI_API_KEY=your-api-key-here
If using a .env file, uncomment python-dotenv in requirements.txt and reinstall dependencies.
Installation
Install Python packages:
pip install -r requirements.txt
Note: The openai package and other dependencies (langchain, pydantic, etc.) will be automatically installed as dependencies of crewai.
Install promptfoo CLI:
npm install -g promptfoo
Files
agent.py: Contains the CrewAI agent setup and promptfoo provider interfacepromptfooconfig.yaml: Configures prompts, providers, and tests for evaluation
Note on Reliability
When using a real LLM, you may notice that the agent's output is not always reliable, especially for more complex queries. For example, the agent may fail to return valid JSON or may not return a response at all. This is a common challenge when working with LLMs.
Running the Evaluation
Run the evaluation:
promptfoo eval
Explore results in browser:
promptfoo view
Troubleshooting
If you see authentication errors:
- Ensure your OpenAI API key is set correctly
- Verify the key is valid and has sufficient quota
- Check that the environment variable is accessible to the Python process
Discussion
Questions & comments · 0
Sign In Sign in to leave a comment.