MCP

Evaluate and Improve AI Assistant Performance

Name: Evaluate and Improve AI Assistant Performance
Availability: OnlineOnly
Author: Mandoline

Mandoline MCP Server integrates AI assistants with Mandoline's evaluation framework for performance improvement.

Connect

Works with claude cursor

⚠️ This tool looks unmaintained — no upstream commits in 6+ months.

Mandoline

Maintainer?

Spark score

out of 100

Updated 8 months ago

Version 0.2.0

Models

claude

Add to Favorites

Why it matters

Enable AI assistants like Claude and Cursor to critically evaluate and continuously improve their own performance using the Mandoline evaluation framework via the Model Context Protocol.

Outcomes

What it gets done

Define custom evaluation metrics for specific tasks.

Evaluate prompt/response pairs against defined metrics.

Monitor AI assistant performance and identify areas for improvement.

Integrate with AI assistants to facilitate self-evaluation.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/vb-mandoline | bash

Capabilities

Tools your agent gets

get_server_health

Confirms MCP server availability and returns correct status

create_metric

Defines custom evaluation criteria for your specific tasks

batch_create_metrics

Creates multiple evaluation metrics in a single operation

get_metric

Retrieves details about a specific metric

get_metrics

Views your metrics with filtering and pagination

update_metric

Modifies existing metric definitions

create_evaluation

Evaluates prompt/response pairs against your metrics

batch_create_evaluations

Evaluates the same content across multiple metrics

+3 tools

Overview

Mandoline MCP Server

What it does

The Mandoline MCP Server enables AI assistants to leverage Mandoline's evaluation framework via the Model Context Protocol (MCP), allowing them to reflect on, critique, and continuously improve their own performance.

How it connects

Use Mandoline MCP Server to integrate evaluation tools into your AI assistant workflows for objective performance analysis and iterative refinement of AI responses or code generation. Do not use if you are not utilizing tools that support the Model Context Protocol.

Source README

Mandoline MCP Server

Enable AI assistants like Claude Code, Claude Desktop, and Cursor to reflect on, critique, and continuously improve their own performance using Mandoline's evaluation framework via the Model Context Protocol.

Client Setup

Most users should start here. Use Mandoline's hosted MCP server to integrate evaluation tools into your AI assistant.

For each integration below, replace sk_**** with your actual API key from mandoline.ai/account.

Claude Code

Use the CLI to add the Mandoline MCP server to Claude Code:

claude mcp add --scope user --transport http mandoline https://mandoline.ai/mcp --header "x-api-key: sk_****"

You can use --scope user (across projects) or --scope project (current project only).

Note: Restart any active Claude Code sessions after configuration changes.

Verify: Run /mcp in Claude Code to see Mandoline listed as a connected server:

Tutorial: Watch Claude evaluate multiple code solutions and pick the best one.

Official Documentation: Claude Code MCP Guide

Codex

Use the CLI to add the Mandoline MCP server to Codex:

codex mcp add mandoline --env MANDOLINE_API_KEY=sk_**** -- npx -y mcp-remote https://mandoline.ai/mcp --header 'x-api-key: ${MANDOLINE_API_KEY}'

Note: Restart any active Codex sessions after configuration changes.

Verify: Run /mcp in Codex to see Mandoline listed as a connected server:

Official Documentation: Codex MCP Configuration

Claude Desktop

Edit your configuration file (Settings > Developer > Edit Config):

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "Mandoline": {
      "command": "npx",
      "args": [
        "-y",
        "mcp-remote",
        "https://mandoline.ai/mcp",
        "--header",
        "x-api-key: ${MANDOLINE_API_KEY}"
      ],
      "env": {
        "MANDOLINE_API_KEY": "sk_****"
      }
    }
  }
}

This configuration applies globally to all conversations.

Note: Restart Claude Desktop after configuration changes.

Verify: Look for Mandoline tools when you click the "Search and tools" button.

Official Documentation: MCP Quickstart Guide

Cursor

Create or edit your MCP configuration file:

{
  "mcpServers": {
    "Mandoline": {
      "url": "https://mandoline.ai/mcp",
      "headers": {
        "x-api-key": "sk_****"
      }
    }
  }
}

You can use your global configuration (affects all projects) ~/.cursor/mcp.json or project-local configuration (current project only) .cursor/mcp.json (in project root)

Note: Restart Cursor after configuration changes.

Verify: Check the Output panel (Ctrl+Shift+U) → "MCP Logs" for successful connection, or look for Mandoline tools in the Composer Agent.

Official Documentation: Cursor MCP Guide

Server Setup

Only needed if you want to run the server locally or contribute to development. Most users should use the hosted server above.

Prerequisites: Node.js 18+ and npm

Installation

Clone and build

git clone https://github.com/mandoline-ai/mandoline-mcp-server.git
cd mandoline-mcp-server
npm install
npm run build

Configure environment (optional)

cp .env.example .env.local
# Edit .env.local to customize PORT, LOG_LEVEL, etc.

Start the server
```
npm start
```

The server runs on http://localhost:8080 by default.

Using Local Server

To use your local server instead of the hosted one, replace https://mandoline.ai/mcp with http://localhost:8080/mcp in the client configurations above.

Usage

Once integrated, you can use Mandoline evaluation tools directly in your AI assistant conversations.

Tools

Health

Tool	Purpose
`get_server_health`	Confirm the MCP server is reachable and returning a healthy status payload.

Metrics

Tool	Purpose
`create_metric`	Define custom evaluation criteria for your specific tasks
`batch_create_metrics`	Create multiple evaluation metrics in one operation
`get_metric`	Retrieve details about a specific metric
`get_metrics`	Browse your metrics with filtering and pagination
`update_metric`	Modify existing metric definitions

Evaluations

Tool	Purpose
`create_evaluation`	Score prompt/response pairs against your metrics
`batch_create_evaluations`	Evaluate the same content against multiple metrics
`get_evaluation`	Retrieve evaluation results and scores
`get_evaluations`	Browse evaluation history with filtering and pagination
`update_evaluation`	Add metadata or context to evaluations

Resources

Resource	Description
`llms.txt`	Mandoline docs index (tools, tutorials, blogs, leaderboards, SDKs); mirrored from `https://mandoline.ai/llms.txt`.
`mcp`	MCP setup guide for assistants; mirrored from `https://mandoline.ai/mcp`.

Discussion

Evaluate and Improve AI Assistant Performance

What it gets done

Add it to your toolbox

Tools your agent gets

Mandoline MCP Server

What it does

How it connects

Mandoline MCP Server

Client Setup

Claude Code

Codex

Claude Desktop

Cursor

Server Setup

Installation

Using Local Server

Usage

Tools

Health

Metrics

Evaluations

Resources

Questions & comments · 0