#
Defining Custom Metrics (LLM-as-a-Judge)
You can define a custom metric, which you can configure according to the use case.
- Click on + Custom Metric.
- Enter a name for your metric. This should clearly describe what you’re evaluating (e.g., Clarity, Empathy, Policy Compliance).
Enter the Metric description
Select the LLM model that will be the LLM-as-a-judge.
In the perspective area, write the evaluation prompt that instructs the LLM how to judge the output. The instructions for writing the prompt have been provided in the prompt box for you to provide a well-structured output.
Use the supported parameters:
Input – the original user query (optional)
Expected Output – ideal response (optional)
Agent Response – actual model output (required)
Context – documents or context used (optional)
- Configure the metrics variable for each row run
- Provide a name for the metric and the data type to be used for scoring
Number: Use this for rating scales (e.g., 1–5 or 0–100).
Boolean: Use this for Yes/No questions
Text: Use this for qualitative feedback or open-ended comments.
Optionally, describe the metric. While not required, this helps both the LLM and human reviewers understand the purpose of the metric.
You can add multiple metrics as per their requirement for evaluation by clicking on the + icon placed on the right side.
Configure the Agent Summary, which is the overall metric aggregated across all rows to assess agent performance for the entire run.
Note: Only Number and Boolean metric variables will be available for adding in score aggregations (Agent Summary). ‘Average’ can be aggregated for the Number data type, and ‘Count of’ can be aggregated for the Boolean data type.
Click on save
All the saved metrics would appear as a list
- You can select the custom metric they want to enable for evaluation
Note- The Custom metrics have a cost associated with them.