BedrockAgentCoreControl / Client / create_evaluator
create_evaluator¶
- BedrockAgentCoreControl.Client.create_evaluator(**kwargs)¶
Creates a custom evaluator for agent quality assessment. Custom evaluators use LLM-as-a-Judge configurations with user-defined prompts, rating scales, and model settings to evaluate agent performance at tool call, trace, or session levels.
See also: AWS API Documentation
Request Syntax
response = client.create_evaluator( clientToken='string', evaluatorName='string', description='string', evaluatorConfig={ 'llmAsAJudge': { 'instructions': 'string', 'ratingScale': { 'numerical': [ { 'definition': 'string', 'value': 123.0, 'label': 'string' }, ], 'categorical': [ { 'definition': 'string', 'label': 'string' }, ] }, 'modelConfig': { 'bedrockEvaluatorModelConfig': { 'modelId': 'string', 'inferenceConfig': { 'maxTokens': 123, 'temperature': ..., 'topP': ..., 'stopSequences': [ 'string', ] }, 'additionalModelRequestFields': {...}|[...]|123|123.4|'string'|True|None } } } }, level='TOOL_CALL'|'TRACE'|'SESSION' )
- Parameters:
clientToken (string) –
A unique, case-sensitive identifier to ensure that the API request completes no more than one time. If you don’t specify this field, a value is randomly generated for you. If this token matches a previous request, the service ignores the request, but doesn’t return an error. For more information, see Ensuring idempotency.
This field is autopopulated if not provided.
evaluatorName (string) –
[REQUIRED]
The name of the evaluator. Must be unique within your account.
description (string) – The description of the evaluator that explains its purpose and evaluation criteria.
evaluatorConfig (dict) –
[REQUIRED]
The configuration for the evaluator, including LLM-as-a-Judge settings with instructions, rating scale, and model configuration.
Note
This is a Tagged Union structure. Only one of the following top level keys can be set:
llmAsAJudge.llmAsAJudge (dict) –
The LLM-as-a-Judge configuration that uses a language model to evaluate agent performance based on custom instructions and rating scales.
instructions (string) – [REQUIRED]
The evaluation instructions that guide the language model in assessing agent performance, including criteria and evaluation guidelines.
ratingScale (dict) – [REQUIRED]
The rating scale that defines how the evaluator should score agent performance, either numerical or categorical.
Note
This is a Tagged Union structure. Only one of the following top level keys can be set:
numerical,categorical.numerical (list) –
The numerical rating scale with defined score values and descriptions for quantitative evaluation.
(dict) –
The definition of a numerical rating scale option that provides a numeric value with its description for evaluation scoring.
definition (string) – [REQUIRED]
The description that explains what this numerical rating represents and when it should be used.
value (float) – [REQUIRED]
The numerical value for this rating scale option.
label (string) – [REQUIRED]
The label or name that describes this numerical rating option.
categorical (list) –
The categorical rating scale with named categories and definitions for qualitative evaluation.
(dict) –
The definition of a categorical rating scale option that provides a named category with its description for evaluation scoring.
definition (string) – [REQUIRED]
The description that explains what this categorical rating represents and when it should be used.
label (string) – [REQUIRED]
The label or name of this categorical rating option.
modelConfig (dict) – [REQUIRED]
The model configuration that specifies which foundation model to use and how to configure it for evaluation.
Note
This is a Tagged Union structure. Only one of the following top level keys can be set:
bedrockEvaluatorModelConfig.bedrockEvaluatorModelConfig (dict) –
The Amazon Bedrock model configuration for evaluation.
modelId (string) – [REQUIRED]
The identifier of the Amazon Bedrock model to use for evaluation. Must be a supported foundation model available in your region.
inferenceConfig (dict) –
The inference configuration parameters that control model behavior during evaluation, including temperature, token limits, and sampling settings.
maxTokens (integer) –
The maximum number of tokens to generate in the model response during evaluation.
temperature (float) –
The temperature value that controls randomness in the model’s responses. Lower values produce more deterministic outputs.
topP (float) –
The top-p sampling parameter that controls the diversity of the model’s responses by limiting the cumulative probability of token choices.
stopSequences (list) –
The list of sequences that will cause the model to stop generating tokens when encountered.
(string) –
additionalModelRequestFields (document) –
Additional model-specific request fields to customize model behavior beyond the standard inference configuration.
level (string) –
[REQUIRED]
The evaluation level that determines the scope of evaluation. Valid values are
TOOL_CALLfor individual tool invocations,TRACEfor single request-response interactions, orSESSIONfor entire conversation sessions.
- Return type:
dict
- Returns:
Response Syntax
{ 'evaluatorArn': 'string', 'evaluatorId': 'string', 'createdAt': datetime(2015, 1, 1), 'status': 'ACTIVE'|'CREATING'|'CREATE_FAILED'|'UPDATING'|'UPDATE_FAILED'|'DELETING' }
Response Structure
(dict) –
evaluatorArn (string) –
The Amazon Resource Name (ARN) of the created evaluator.
evaluatorId (string) –
The unique identifier of the created evaluator.
createdAt (datetime) –
The timestamp when the evaluator was created.
status (string) –
The status of the evaluator creation operation.
Exceptions
BedrockAgentCoreControl.Client.exceptions.ServiceQuotaExceededExceptionBedrockAgentCoreControl.Client.exceptions.ValidationExceptionBedrockAgentCoreControl.Client.exceptions.AccessDeniedExceptionBedrockAgentCoreControl.Client.exceptions.ThrottlingExceptionBedrockAgentCoreControl.Client.exceptions.InternalServerException