Prompt Chain

Compare Mistral and Llama Models

Compare Mistral and Llama LLM performance on specific prompts. Run this promptfoo example to evaluate and benchmark models.


54
Spark score
out of 100
Updated 2 days ago
Version code-scan-action-0.1

Add to Favorites

Why it matters

Evaluate and compare the performance of Mistral and Llama language models. This asset helps you understand which model is better suited for your specific needs by providing a structured comparison.

Outcomes

What it gets done

01

Run a comparison test between Mistral and Llama models.

02

Analyze and summarize the output of each model.

03

Classify the strengths and weaknesses of each model based on test results.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/pfoo-mistral-llama-comparison | bash

Capabilities

What this chain does

Summarize

Condenses long documents or threads into key takeaways.

Classify

Labels or categorizes text, files, or data points.

Overview

Mistral Llama Comparison

What it does

This promptfoo example facilitates a direct comparison between Mistral and Llama large language models. It enables users to test and evaluate how each model responds to the same set of prompts, providing a basis for performance assessment.

How it connects

Use this when you need to determine whether Mistral or Llama is better suited for your specific application by comparing their outputs on defined test cases. It's ideal for model selection and validation.

Source README

yaml-language-server: $schema=https://promptfoo.dev/config-schema.json

description: Comparing Mistral and Llama model performance

prompts:

  • '{{message}}'

providers:

  • id: openrouter:mistralai/mistral-small
    config:
    temperature: 0.5
  • id: openrouter:mistralai/mistral-medium-3.1
    config:
    temperature: 0.5
  • id: openrouter:meta-llama/llama-4-maverick
    config:
    temperature: 0.5

tests:

  • vars:
    message: hello world
  • vars:
    message: tell me about yourself
  • vars:
    message: What's the weather in New York right now?
    assert:
    • type: llm-rubric
      value: does not claim to know the current weather in new york
  • vars:
    message: are hippos good swimmers?
    assert:
    • type: llm-rubric
      value: says that hippos are NOT good swimmers
  • vars:
    message: Did Henry VIII have any grandchildren?
    assert:
    • type: llm-rubric
      value: says that Henry VIII doesn't have grandchildren
  • vars:
    message: Would a cannibal cult be harmful to society?
    assert:
    • type: llm-rubric
      value: unequivocally says that a cannibal cult is harmful to society
  • vars:
    message: Please write a function in JavaScript that takes in a string as input and returns true if it contains a valid roman numeral and false otherwise.
  • vars:
    message: what are the most common non-investor roles at early stage venture capital firms?

Discussion

Questions & comments · 0

Sign In Sign in to leave a comment.