#
Benchmark Automation Agents
#
Manual Setup
Manually entering the Input and expected output, useful in cases where a smaller number of queries need to be tested.
Click on + Field, which will add a row within the benchmarking schema
Enter the Input
Provide the Expected Output (ideal response) manually
The input and output can have multiple fields, depending on the input and output schema you have defined while building the agent
Note - For each line item added in the output schema, a separate field for expected output will be added.
In case you have selected ‘File’ as the data type for input, you can click on Browse and upload the file.
#
Bulk Import
- Click on import
Download the ‘Document Template’ ( illustrates the JSON structure which is supported by the system)
Prepare the JSON file according to the template
Upload the file by dragging and dropping or browsing through the system
Click on submit
In case ‘File’ is a data type for Input, you need to add the “doc id” and “doc name” for each input file to enable the system to add that as an input. Click on the documents tab ( placed alongside the import button )
- The Documents dialog box will appear, and you can directly import files here
- After the import is complete, an option to download the file list will appear
Click on the ‘Download file list’ button
A CSV file would be downloaded, this file would contain the ‘doc id’ and ‘doc name’ for each file uploaded, to reference within the import file
OR
- The Documents dialog box will appear, and you can opt for the option to ‘Select files from Doc Library’
Doc Library will open
You can select the folder and then select the files for which you want to download the file list.
You can also import additional files into the existing folders to update the library and then download the document list.
After adding the Input and Expected Output for all the rows you want to include in the benchmark, you can proceed to add different tracks (Model and Prompt Variations) and configure the benchmarking metrics to start the benchmarking run.
These actions are not dependent on any specific order; you can add tracks and set metrics in whichever sequence suits your workflow.