Evals help you verify that your MCP tools are called correctly by an LLM. This guide shows how to run tool-call evaluations with Evalite using the AI SDK MCP client.
The approach stays library-agnostic, Evalite is just the example runner. You can adapt the patterns to other evaluation frameworks.
pnpm dev with the module enabled)Install Evalite, Vitest, and the AI SDK packages:
pnpm add -D evalite vitest @ai-sdk/mcp ai
npm install -D evalite vitest @ai-sdk/mcp ai
yarn add -D evalite vitest @ai-sdk/mcp ai
bun add -D evalite vitest @ai-sdk/mcp ai
Add the following scripts to your package.json:
{
"scripts": {
"eval": "evalite",
"eval:ui": "evalite watch"
}
}
Create a .env file with your AI provider key and MCP endpoint:
# AI provider (AI Gateway example)
AI_GATEWAY_API_KEY=your_key
# MCP endpoint exposed by your dev server
MCP_URL=http://localhost:3000/mcp
Create an eval file in your test/ directory:
import { experimental_createMCPClient as createMCPClient } from '@ai-sdk/mcp'
import { generateText } from 'ai'
import { evalite } from 'evalite'
import { toolCallAccuracy } from 'evalite/scorers'
// AI Gateway model format: provider/model-name
const model = 'openai/gpt-4o-mini'
const MCP_URL = process.env.MCP_URL ?? 'http://localhost:3000/mcp'
evalite('BMI Calculator', {
data: async () => [
{
input: 'Calculate BMI for someone who weighs 70kg and is 1.75m tall',
expected: [{ toolName: 'calculate-bmi', input: { weightKg: 70, heightM: 1.75 } }],
},
],
task: async (input) => {
const mcp = await createMCPClient({ transport: { type: 'http', url: MCP_URL } })
try {
const result = await generateText({
model,
prompt: input,
tools: await mcp.tools(),
})
return result.toolCalls ?? []
}
finally {
await mcp.close()
}
},
scorers: [({ output, expected }) => toolCallAccuracy({ actualCalls: output, expectedCalls: expected })],
})
Make sure your MCP server is running first:
pnpm dev
npm run dev
yarn dev
bun dev
Then run your evals in a separate terminal:
pnpm eval
npm run eval
yarn eval
bun eval
Or launch the Evalite UI for a visual interface:
pnpm eval:ui
npm run eval:ui
yarn eval:ui
bun eval:ui
The UI is available at http://localhost:3006 and shows traces, scores, inputs, and outputs for each eval.
We recommend placing eval files in a test/ directory at your project root:
your-project/
├── server/
│ └── mcp/
│ ├── tools/
│ │ └── calculate-bmi.ts
│ ├── resources/
│ └── prompts/
├── test/
│ └── mcp.eval.ts # Your MCP eval tests
├── nuxt.config.ts
└── package.json
.eval.ts extension by default.Verify the model picks the correct tool:
evalite('Tool Selection', {
data: async () => [
{
input: 'List all available documentation pages',
expected: [{ toolName: 'list-pages' }],
},
{
input: 'Show me the installation guide',
expected: [{ toolName: 'get-page', input: { path: '/getting-started/installation' } }],
},
],
task: async (input) => {
const mcp = await createMCPClient({ transport: { type: 'http', url: MCP_URL } })
try {
const result = await generateText({
model,
prompt: input,
tools: await mcp.tools(),
})
return result.toolCalls ?? []
}
finally {
await mcp.close()
}
},
scorers: [({ output, expected }) => toolCallAccuracy({ actualCalls: output, expectedCalls: expected })],
})
For workflows that require multiple tool calls, increase maxSteps:
evalite('Multi-Step Workflows', {
data: async () => [
{
input: 'Find the installation page and show me its content',
expected: [
{ toolName: 'list-pages' },
{ toolName: 'get-page', input: { path: '/getting-started/installation' } },
],
},
],
task: async (input) => {
const mcp = await createMCPClient({ transport: { type: 'http', url: MCP_URL } })
try {
const result = await generateText({
model,
prompt: input,
tools: await mcp.tools(),
maxSteps: 5, // Allow multiple tool calls
})
return result.toolCalls ?? []
}
finally {
await mcp.close()
}
},
scorers: [({ output, expected }) => toolCallAccuracy({ actualCalls: output, expectedCalls: expected })],
})
Organize evals by feature or tool category:
// Documentation tools
evalite('Documentation Tools', {
data: async () => [
{ input: 'List all docs', expected: [{ toolName: 'list-pages' }] },
{ input: 'Get the intro page', expected: [{ toolName: 'get-page' }] },
],
// ...
})
// API tools
evalite('API Tools', {
data: async () => [
{ input: 'Fetch user data', expected: [{ toolName: 'get-user' }] },
{ input: 'Create a new post', expected: [{ toolName: 'create-post' }] },
],
// ...
})