BedrockAgentCoreControl / Client / update_evaluator
update_evaluator¶
- BedrockAgentCoreControl.Client.update_evaluator(**kwargs)¶
Updates a custom evaluator’s configuration, description, or evaluation level. Built-in evaluators cannot be updated. The evaluator must not be locked for modification.
See also: AWS API Documentation
Request Syntax
response = client.update_evaluator( clientToken='string', evaluatorId='string', description='string', evaluatorConfig={ 'llmAsAJudge': { 'instructions': 'string', 'ratingScale': { 'numerical': [ { 'definition': 'string', 'value': 123.0, 'label': 'string' }, ], 'categorical': [ { 'definition': 'string', 'label': 'string' }, ] }, 'modelConfig': { 'bedrockEvaluatorModelConfig': { 'modelId': 'string', 'inferenceConfig': { 'maxTokens': 123, 'temperature': ..., 'topP': ..., 'stopSequences': [ 'string', ] }, 'additionalModelRequestFields': {...}|[...]|123|123.4|'string'|True|None } } } }, level='TOOL_CALL'|'TRACE'|'SESSION' )
- Parameters:
clientToken (string) –
A unique, case-sensitive identifier to ensure that the API request completes no more than one time. If you don’t specify this field, a value is randomly generated for you. If this token matches a previous request, the service ignores the request, but doesn’t return an error. For more information, see Ensuring idempotency.
This field is autopopulated if not provided.
evaluatorId (string) –
[REQUIRED]
The unique identifier of the evaluator to update.
description (string) – The updated description of the evaluator.
evaluatorConfig (dict) –
The updated configuration for the evaluator, including LLM-as-a-Judge settings with instructions, rating scale, and model configuration.
Note
This is a Tagged Union structure. Only one of the following top level keys can be set:
llmAsAJudge.llmAsAJudge (dict) –
The LLM-as-a-Judge configuration that uses a language model to evaluate agent performance based on custom instructions and rating scales.
instructions (string) – [REQUIRED]
The evaluation instructions that guide the language model in assessing agent performance, including criteria and evaluation guidelines.
ratingScale (dict) – [REQUIRED]
The rating scale that defines how the evaluator should score agent performance, either numerical or categorical.
Note
This is a Tagged Union structure. Only one of the following top level keys can be set:
numerical,categorical.numerical (list) –
The numerical rating scale with defined score values and descriptions for quantitative evaluation.
(dict) –
The definition of a numerical rating scale option that provides a numeric value with its description for evaluation scoring.
definition (string) – [REQUIRED]
The description that explains what this numerical rating represents and when it should be used.
value (float) – [REQUIRED]
The numerical value for this rating scale option.
label (string) – [REQUIRED]
The label or name that describes this numerical rating option.
categorical (list) –
The categorical rating scale with named categories and definitions for qualitative evaluation.
(dict) –
The definition of a categorical rating scale option that provides a named category with its description for evaluation scoring.
definition (string) – [REQUIRED]
The description that explains what this categorical rating represents and when it should be used.
label (string) – [REQUIRED]
The label or name of this categorical rating option.
modelConfig (dict) – [REQUIRED]
The model configuration that specifies which foundation model to use and how to configure it for evaluation.
Note
This is a Tagged Union structure. Only one of the following top level keys can be set:
bedrockEvaluatorModelConfig.bedrockEvaluatorModelConfig (dict) –
The Amazon Bedrock model configuration for evaluation.
modelId (string) – [REQUIRED]
The identifier of the Amazon Bedrock model to use for evaluation. Must be a supported foundation model available in your region.
inferenceConfig (dict) –
The inference configuration parameters that control model behavior during evaluation, including temperature, token limits, and sampling settings.
maxTokens (integer) –
The maximum number of tokens to generate in the model response during evaluation.
temperature (float) –
The temperature value that controls randomness in the model’s responses. Lower values produce more deterministic outputs.
topP (float) –
The top-p sampling parameter that controls the diversity of the model’s responses by limiting the cumulative probability of token choices.
stopSequences (list) –
The list of sequences that will cause the model to stop generating tokens when encountered.
(string) –
additionalModelRequestFields (document) –
Additional model-specific request fields to customize model behavior beyond the standard inference configuration.
level (string) – The updated evaluation level (
TOOL_CALL,TRACE, orSESSION) that determines the scope of evaluation.
- Return type:
dict
- Returns:
Response Syntax
{ 'evaluatorArn': 'string', 'evaluatorId': 'string', 'updatedAt': datetime(2015, 1, 1), 'status': 'ACTIVE'|'CREATING'|'CREATE_FAILED'|'UPDATING'|'UPDATE_FAILED'|'DELETING' }
Response Structure
(dict) –
evaluatorArn (string) –
The Amazon Resource Name (ARN) of the updated evaluator.
evaluatorId (string) –
The unique identifier of the updated evaluator.
updatedAt (datetime) –
The timestamp when the evaluator was last updated.
status (string) –
The status of the evaluator update operation.
Exceptions
BedrockAgentCoreControl.Client.exceptions.ServiceQuotaExceededExceptionBedrockAgentCoreControl.Client.exceptions.ValidationExceptionBedrockAgentCoreControl.Client.exceptions.AccessDeniedExceptionBedrockAgentCoreControl.Client.exceptions.ResourceNotFoundExceptionBedrockAgentCoreControl.Client.exceptions.ThrottlingExceptionBedrockAgentCoreControl.Client.exceptions.InternalServerException