Evaluate AI Agent Performance with CrewAI
Evaluate CrewAI agent performance with promptfoo. Test and compare AI agents for better results.
Why it matters
Leverage CrewAI agents within promptfoo to rigorously evaluate and benchmark the performance of your AI agents. Ensure your agents meet quality standards before deployment.
Outcomes
What it gets done
Set up CrewAI agents for task execution.
Integrate agents with promptfoo for evaluation.
Analyze agent outputs for performance metrics.
Iterate on agent design based on evaluation results.
Install
Add it to your toolbox
Run in your project directory:
curl -fsSL https://spark.entire.vc/get/pfoo-crewai | bash Capabilities
What this chain does
Searches the web and retrieves relevant sources.
Condenses long documents or threads into key takeaways.
Writes source code or scripts from a description.
Overview
Crewai
What it does
This example shows how to use CrewAI agents with promptfoo to evaluate AI agent performance. It provides a framework for systematically testing and comparing the effectiveness of your AI agents.
How it connects
Use this when you need to rigorously assess and compare the performance of your CrewAI agents. It's ideal for fine-tuning agent behavior and ensuring consistent, high-quality outputs.
Source README
yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
description: 'CrewAI Recruitment Agent Evaluation'
prompts:
- 'Find top candidates for the following role: {{role}}'
providers:
- id: 'file://./agent.py'
label: 'CrewAI Recruitment Agent'
config:
model: 'openai:gpt-4.1'
It's a good practice to have a defaultTest that applies to all test cases.
defaultTest:
assert:
# We expect the agent to always return a valid JSON object
- type: is-json
# We expect the output to contain a "candidates" key
- type: javascript
value: 'output.hasOwnProperty("candidates")'
tests:
description: 'Senior Full-Stack Engineer'
vars:
role: 'A Senior Full-Stack Engineer with 8+ years of experience in Python, Django, and React.'
assert:type: javascript
value: |
// Check that there are at least 2 candidates
return output.candidates.length >= 2;type: python
value: |Check that all candidates have relevant skills
required_skills = ['python', 'django', 'react']
all_have_skills = all(
any(req_skill in skill.lower() for skill in candidate.get('skills', []) for req_skill in required_skills)
for candidate in output.get('candidates', [])
)
return all_have_skills
description: 'Data Scientist with Machine Learning and Cloud'
vars:
role: 'A Data Scientist with machine learning, Python, and cloud (AWS or GCP) experience.'
assert:type: javascript
value: |
// Check that there are at least 2 candidates
return output.candidates.length >= 2;type: python
value: |Check for relevant data science and cloud skills
required_skills = ['machine learning', 'python', 'aws', 'gcp', 'tensorflow', 'pytorch']
all_have_skills = all(
any(req_skill in skill.lower() for skill in candidate.get('skills', []) for req_skill in required_skills)
for candidate in output.get('candidates', [])
)
return all_have_skills
description: 'Junior UX/UI Designer'
vars:
role: 'A junior UX/UI designer with Figma and Adobe Creative Suite experience.'
assert:type: javascript
value: |
// Check that there are at least 2 candidates
return output.candidates.length >= 2;type: python
value: |Check for relevant design tool skills
required_skills = ['figma', 'adobe', 'ux', 'ui', 'user experience', 'user interface']
all_have_skills = all(
any(req_skill in skill.lower() for skill in candidate.get('skills', []) for req_skill in required_skills)
for candidate in output.get('candidates', [])
)
return all_have_skills
Discussion
Questions & comments · 0
Sign In Sign in to leave a comment.