#
Quick Start
- Get to Your Agent
Navigate to Expert Agent Studio, find and open the agent you want to test, and click on “Benchmark”.
- Start Fresh
You’ll see past benchmarks (if any). To begin a new one, just click on “Start New”.
Give it a name that helps you remember what it’s for — something like “New Prompt A/B Test”.
- Set Up Your Tracks
Your current agent’s model and prompt will already show up as the first track (Column)
You can add more tracks by clicking the “+” button – try out different prompts or models
For each track, you can customize:
Prompt text
Model
Model settings (e.g., temperature, Top p)
- Add Inputs and Expected Outputs
This is what the agent will be tested against.
Option 1: Add Manually
Just click “+ Field” to add a row.
You can then type in:
Input (user question or query)
Expected Output (ideal response)
Option 2: Bulk Upload
Click “Import” and upload a JSON file with multiple inputs and expected outputs.
- Pick Your Metrics
Click the little bar graph icon to open Benchmark Settings.
You can choose from:
- Evaluative Metrics (e.g., Exact Match, Faithfulness, Relevance)
- Operational Metrics (e.g., token usage, cost)
- Custom Metrics (you can create your own using LLM-as-a-Judge)
Each metric helps you understand how well your agent is doing
- Run the Benchmark
Once everything’s ready (inputs + tracks + metrics), you can select the rows you want to test and hit “Run Prompt.”
- Check and Export Results
After the run is done, you’ll see performance scores for each row and track.
Just click “Export”, choose what info you need, and download the file.