Prompt Chain

Evaluate AI Agent Performance with CrewAI

Evaluate CrewAI agent performance with promptfoo. Test and compare AI agents for better results.

Works with crewaipromptfoo

54
Spark score
out of 100
Updated yesterday
Version code-scan-action-0.1

Add to Favorites

Why it matters

Leverage CrewAI agents within promptfoo to rigorously evaluate and benchmark the performance of your AI agents. Ensure your agents meet quality standards before deployment.

Outcomes

What it gets done

01

Set up CrewAI agents for task execution.

02

Integrate agents with promptfoo for evaluation.

03

Analyze agent outputs for performance metrics.

04

Iterate on agent design based on evaluation results.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/pfoo-crewai | bash

Capabilities

What this chain does

Search the web

Searches the web and retrieves relevant sources.

Summarize

Condenses long documents or threads into key takeaways.

Generate code

Writes source code or scripts from a description.

Overview

Crewai

What it does

This example shows how to use CrewAI agents with promptfoo to evaluate AI agent performance. It provides a framework for systematically testing and comparing the effectiveness of your AI agents.

How it connects

Use this when you need to rigorously assess and compare the performance of your CrewAI agents. It's ideal for fine-tuning agent behavior and ensuring consistent, high-quality outputs.

Source README

yaml-language-server: $schema=https://promptfoo.dev/config-schema.json

description: 'CrewAI Recruitment Agent Evaluation'

prompts:

  • 'Find top candidates for the following role: {{role}}'

providers:

  • id: 'file://./agent.py'
    label: 'CrewAI Recruitment Agent'
    config:
    model: 'openai:gpt-4.1'

It's a good practice to have a defaultTest that applies to all test cases.

defaultTest:
assert:
# We expect the agent to always return a valid JSON object
- type: is-json
# We expect the output to contain a "candidates" key
- type: javascript
value: 'output.hasOwnProperty("candidates")'

tests:

  • description: 'Senior Full-Stack Engineer'
    vars:
    role: 'A Senior Full-Stack Engineer with 8+ years of experience in Python, Django, and React.'
    assert:

    • type: javascript
      value: |
      // Check that there are at least 2 candidates
      return output.candidates.length >= 2;

    • type: python
      value: |

      Check that all candidates have relevant skills

      required_skills = ['python', 'django', 'react']
      all_have_skills = all(
      any(req_skill in skill.lower() for skill in candidate.get('skills', []) for req_skill in required_skills)
      for candidate in output.get('candidates', [])
      )
      return all_have_skills

  • description: 'Data Scientist with Machine Learning and Cloud'
    vars:
    role: 'A Data Scientist with machine learning, Python, and cloud (AWS or GCP) experience.'
    assert:

    • type: javascript
      value: |
      // Check that there are at least 2 candidates
      return output.candidates.length >= 2;

    • type: python
      value: |

      Check for relevant data science and cloud skills

      required_skills = ['machine learning', 'python', 'aws', 'gcp', 'tensorflow', 'pytorch']
      all_have_skills = all(
      any(req_skill in skill.lower() for skill in candidate.get('skills', []) for req_skill in required_skills)
      for candidate in output.get('candidates', [])
      )
      return all_have_skills

  • description: 'Junior UX/UI Designer'
    vars:
    role: 'A junior UX/UI designer with Figma and Adobe Creative Suite experience.'
    assert:

    • type: javascript
      value: |
      // Check that there are at least 2 candidates
      return output.candidates.length >= 2;

    • type: python
      value: |

      Check for relevant design tool skills

      required_skills = ['figma', 'adobe', 'ux', 'ui', 'user experience', 'user interface']
      all_have_skills = all(
      any(req_skill in skill.lower() for skill in candidate.get('skills', []) for req_skill in required_skills)
      for candidate in output.get('candidates', [])
      )
      return all_have_skills

Discussion

Questions & comments · 0

Sign In Sign in to leave a comment.