Prompt Chain

Evaluate AI Agent Performance with CrewAI

Name: Evaluate AI Agent Performance with CrewAI
Availability: OnlineOnly
Author: Promptfoo

Evaluate CrewAI agent performance with promptfoo. Test and compare AI agents for better results.

Copy chain

Works with crewaipromptfoo

Promptfoo

Maintainer?

Spark score

out of 100

Updated yesterday

Version code-scan-action-0.1

Add to Favorites

Why it matters

Leverage CrewAI agents within promptfoo to rigorously evaluate and benchmark the performance of your AI agents. Ensure your agents meet quality standards before deployment.

Outcomes

What it gets done

Set up CrewAI agents for task execution.

Integrate agents with promptfoo for evaluation.

Analyze agent outputs for performance metrics.

Iterate on agent design based on evaluation results.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/pfoo-crewai | bash

Capabilities

What this chain does

Search the web

Searches the web and retrieves relevant sources.

Summarize

Condenses long documents or threads into key takeaways.

Generate code

Writes source code or scripts from a description.

Overview

Crewai

What it does

This example shows how to use CrewAI agents with promptfoo to evaluate AI agent performance. It provides a framework for systematically testing and comparing the effectiveness of your AI agents.

How it connects

Use this when you need to rigorously assess and compare the performance of your CrewAI agents. It's ideal for fine-tuning agent behavior and ensuring consistent, high-quality outputs.

Source README

yaml-language-server: $schema=https://promptfoo.dev/config-schema.json

description: 'CrewAI Recruitment Agent Evaluation'

prompts:

'Find top candidates for the following role: {{role}}'

providers:

id: 'file://./agent.py'
label: 'CrewAI Recruitment Agent'
config:
model: 'openai:gpt-4.1'

It's a good practice to have a defaultTest that applies to all test cases.

defaultTest:
assert:
# We expect the agent to always return a valid JSON object
- type: is-json
# We expect the output to contain a "candidates" key
- type: javascript
value: 'output.hasOwnProperty("candidates")'

tests:

description: 'Senior Full-Stack Engineer'
vars:
role: 'A Senior Full-Stack Engineer with 8+ years of experience in Python, Django, and React.'
assert:
- type: javascript
  value: |
  // Check that there are at least 2 candidates
  return output.candidates.length >= 2;
- type: python
  value: |
  
  Check that all candidates have relevant skills
  
  required_skills = ['python', 'django', 'react']
  all_have_skills = all(
  any(req_skill in skill.lower() for skill in candidate.get('skills', []) for req_skill in required_skills)
  for candidate in output.get('candidates', [])
  )
  return all_have_skills
description: 'Data Scientist with Machine Learning and Cloud'
vars:
role: 'A Data Scientist with machine learning, Python, and cloud (AWS or GCP) experience.'
assert:
- type: javascript
  value: |
  // Check that there are at least 2 candidates
  return output.candidates.length >= 2;
- type: python
  value: |
  
  Check for relevant data science and cloud skills
  
  required_skills = ['machine learning', 'python', 'aws', 'gcp', 'tensorflow', 'pytorch']
  all_have_skills = all(
  any(req_skill in skill.lower() for skill in candidate.get('skills', []) for req_skill in required_skills)
  for candidate in output.get('candidates', [])
  )
  return all_have_skills
description: 'Junior UX/UI Designer'
vars:
role: 'A junior UX/UI designer with Figma and Adobe Creative Suite experience.'
assert:
- type: javascript
  value: |
  // Check that there are at least 2 candidates
  return output.candidates.length >= 2;
- type: python
  value: |
  
  Check for relevant design tool skills
  
  required_skills = ['figma', 'adobe', 'ux', 'ui', 'user experience', 'user interface']
  all_have_skills = all(
  any(req_skill in skill.lower() for skill in candidate.get('skills', []) for req_skill in required_skills)
  for candidate in output.get('candidates', [])
  )
  return all_have_skills

Discussion

Evaluate AI Agent Performance with CrewAI

What it gets done

Add it to your toolbox

What this chain does

Crewai

What it does

How it connects

yaml-language-server: $schema=https://promptfoo.dev/config-schema.json

It's a good practice to have a defaultTest that applies to all test cases.

Check that all candidates have relevant skills

Check for relevant data science and cloud skills

Check for relevant design tool skills

Questions & comments · 0