# Quick Start

  1. Get to Your Agent:
    Navigate to Expert Agent Studio, find and open the agent you want to test, and click on Benchmark

  2. Start Fresh:
    You’ll see past benchmarks (if any). To begin a new one, just click on Start New Give it a name that helps you remember what it’s for - something like New Prompt A/B Test

  3. Set Up Your Tracks:

    • Your current agent’s model and prompt will already show up as the first track (Column)
    • You can add more tracks by clicking the + button – try out different prompts or models
      For each track, you can customize:
      * Prompt text
      * Model
      * Model settings (e.g., temperature, Top p)
  4. Add Inputs and Expected Outputs:
    This is what the agent will be tested against
    Option 1: Add Manually
    Just click “+ Field” to add a row.
    you can then type in:
    * Input (user question or query)
    * Expected Output (ideal response)
    * Optionally you can Add files as context to support the input query

    Option 2: Bulk Upload Click “Import” and upload a JSON file with multiple inputs and expected outputs. You can download the template to get a reference of the sample JSON

  5. Pick Your Metrics: Click the little bar graph icon to open Benchmark Settings. You can choose from:

    • Evaluative Metrics (e.g., Exact Match, Faithfulness, Relevance)
    • Operational Metrics (e.g., token usage, cost)
    • Custom Metrics (you can create your own using LLM-as-a-Judge)
      Each metric helps you understand how well your agent is doing
  6. Run the Benchmark: Once everything’s ready (inputs + tracks + metrics), select the rows you want to test and hit Run Prompt

  7. Check and Export Results:
    After the run is done, you’ll see performance scores for each row and track.
    Simply click Export, select the desired information, the desired file format and then download the file.