Documentation

# Benchmark Automation Agents

# Manual Setup

Manually entering the Input and expected output, useful in cases where a smaller number of queries need to be tested.

  1. Click on + Field, which will add a row within the benchmarking schema

  2. Enter the Input

  3. Provide the Expected Output (ideal response) manually

  4. The input and output can have multiple fields, depending on the input and output schema you have defined while building the agent

Note - For each line item added in the output schema, a separate field for expected output will be added.

In case you have selected ‘File’ as the data type for input, you can click on Browse and upload the file.

# Bulk Import

  1. Click on import

  1. Download the ‘Document Template’ ( illustrates the JSON structure which is supported by the system)

  2. Prepare the JSON file according to the template

  3. Upload the file by dragging and dropping or browsing through the system

  4. Click on submit

In case ‘File’ is a data type for Input, you need to add the “doc id” and “doc name” for each input file to enable the system to add that as an input. Click on the documents tab ( placed alongside the import button )

  1. The Documents dialog box will appear, and you can directly import files here

  1. After the import is complete, an option to download the file list will appear

  1. Click on the ‘Download file list’ button

  2. A CSV file would be downloaded, this file would contain the ‘doc id’ and ‘doc name’ for each file uploaded, to reference within the import file

OR

  1. The Documents dialog box will appear, and you can opt for the option to ‘Select files from Doc Library’

  1. Doc Library will open

  2. You can select the folder and then select the files for which you want to download the file list.

  3. You can also import additional files into the existing folders to update the library and then download the document list.

After adding the Input and Expected Output for all the rows you want to include in the benchmark, you can proceed to add different tracks (Model and Prompt Variations) and configure the benchmarking metrics to start the benchmarking run.

These actions are not dependent on any specific order; you can add tracks and set metrics in whichever sequence suits your workflow.