Documentation

# Quick Start

  1. Get to Your Agent

Navigate to Expert Agent Studio, find and open the agent you want to test, and click on “Benchmark”.

  1. Start Fresh

You’ll see past benchmarks (if any). To begin a new one, just click on “Start New”.
Give it a name that helps you remember what it’s for — something like “New Prompt A/B Test”.

  1. Set Up Your Tracks

Your current agent’s model and prompt will already show up as the first track (Column)
You can add more tracks by clicking the “+” button – try out different prompts or models
For each track, you can customize:

  • Prompt text

  • Model

  • Model settings (e.g., temperature, Top p)

  1. Add Inputs and Expected Outputs

This is what the agent will be tested against.

Option 1: Add Manually

Just click “+ Field” to add a row.
You can then type in:

  • Input (user question or query)

  • Expected Output (ideal response)

Option 2: Bulk Upload

Click “Import” and upload a JSON file with multiple inputs and expected outputs.

  1. Pick Your Metrics

Click the little bar graph icon to open Benchmark Settings.

You can choose from:

  • Evaluative Metrics (e.g., Exact Match, Faithfulness, Relevance)

  • Operational Metrics (e.g., token usage, cost)

  • Custom Metrics (you can create your own using LLM-as-a-Judge)

Each metric helps you understand how well your agent is doing

  1. Run the Benchmark

Once everything’s ready (inputs + tracks + metrics), you can select the rows you want to test and hit “Run Prompt.”

  1. Check and Export Results

After the run is done, you’ll see performance scores for each row and track.
Just click “Export”, choose what info you need, and download the file.