You can define a custom metric, which you can configure according to the use case.
Click + Create new LLM-as-a-Judge
Enter a name for your metric. This should clearly describe what you’re evaluating (e.g., Clarity, Empathy, Policy Compliance)
Enter the Metric description
Select the LLM model that will be the LLM-as-a-judge
In the perspective area, write the evaluation prompt that instructs the LLM how to judge the output. The instructions for writing the prompt have been provided in the prompt box for you to provide a well-structured output
You can use the supported parameters:
Input – the original user query (optional)
Expected Output – ideal response (optional)
Agent Response – actual model output (required)
Context – documents or context used (optional)
Use the supported parameters, you can use these parameters inside the prompt instructions by typing “{“ and selecting the appropriate parameter from the drop down list
Configure the metrics variable for each row run
Provide a name for the metric and the data type to be used for scoring
Number: Use this for rating scales (e.g., 1–5 or 0–100)
Boolean: Use this for Yes/No questions
Text: Use this for qualitative feedback or open-ended comments
Optionally, describe the metric. While not required, this helps both the LLM and human reviewers understand the purpose of the metric
You can add multiple metrics as per their requirement for evaluation by clicking on the + icon placed on the right side
Configure the Agent Summary, which is the overall metric aggregated across all rows to assess agent performance for the entire run
Note: Only Number and Boolean metric variables will be available for adding in score aggregations (Agent Summary). Average can be aggregated for the Number data type, and ‘Count of’ can be aggregated for the Boolean data type
Click Save
All the saved metrics appears as a list
You can select or deselect specific metrics in benchmarking, with scores updating without triggering additional LLM runs, reducing both cost and latency.
You can select the custom metric they want to enable for evaluation
The Custom metrics have a cost associated with them.