# Defining Custom Metrics (LLM-as-a-Judge)

You can define a custom metric, which you can configure according to the use case.

  1. Click + Create new LLM-as-a-Judge

  2. Enter a name for your metric. This should clearly describe what you’re evaluating (e.g., Clarity, Empathy, Policy Compliance)

  3. Enter the Metric description

  4. Select the LLM model that will be the LLM-as-a-judge

  5. In the perspective area, write the evaluation prompt that instructs the LLM how to judge the output. The instructions for writing the prompt have been provided in the prompt box for you to provide a well-structured output

    You can use the supported parameters: Input – the original user query (optional) Expected Output – ideal response (optional) Agent Response – actual model output (required) Context – documents or context used (optional)

  6. Use the supported parameters, you can use these parameters inside the prompt instructions by typing “{“ and selecting the appropriate parameter from the drop down list

  7. Configure the metrics variable for each row run

  8. Provide a name for the metric and the data type to be used for scoring Number: Use this for rating scales (e.g., 1–5 or 0–100) Boolean: Use this for Yes/No questions Text: Use this for qualitative feedback or open-ended comments

  9. Optionally, describe the metric. While not required, this helps both the LLM and human reviewers understand the purpose of the metric

  10. You can add multiple metrics as per their requirement for evaluation by clicking on the + icon placed on the right side

  11. Configure the Agent Summary, which is the overall metric aggregated across all rows to assess agent performance for the entire run

    Note: Only Number and Boolean metric variables will be available for adding in score aggregations (Agent Summary). Average can be aggregated for the Number data type, and ‘Count of’ can be aggregated for the Boolean data type

  12. Click Save

  13. All the saved metrics appears as a list

  1. You can select the custom metric they want to enable for evaluation