Glue / Client / start_data_quality_ruleset_evaluation_run
start_data_quality_ruleset_evaluation_run#
- Glue.Client.start_data_quality_ruleset_evaluation_run(**kwargs)#
Once you have a ruleset definition (either recommended or your own), you call this operation to evaluate the ruleset against a data source (Glue table). The evaluation computes results which you can retrieve with the
GetDataQualityResult
API.See also: AWS API Documentation
Request Syntax
response = client.start_data_quality_ruleset_evaluation_run( DataSource={ 'GlueTable': { 'DatabaseName': 'string', 'TableName': 'string', 'CatalogId': 'string', 'ConnectionName': 'string', 'AdditionalOptions': { 'string': 'string' } } }, Role='string', NumberOfWorkers=123, Timeout=123, ClientToken='string', AdditionalRunOptions={ 'CloudWatchMetricsEnabled': True|False, 'ResultsS3Prefix': 'string', 'CompositeRuleEvaluationMethod': 'COLUMN'|'ROW' }, RulesetNames=[ 'string', ], AdditionalDataSources={ 'string': { 'GlueTable': { 'DatabaseName': 'string', 'TableName': 'string', 'CatalogId': 'string', 'ConnectionName': 'string', 'AdditionalOptions': { 'string': 'string' } } } } )
- Parameters:
DataSource (dict) –
[REQUIRED]
The data source (Glue table) associated with this run.
GlueTable (dict) – [REQUIRED]
An Glue table.
DatabaseName (string) – [REQUIRED]
A database name in the Glue Data Catalog.
TableName (string) – [REQUIRED]
A table name in the Glue Data Catalog.
CatalogId (string) –
A unique identifier for the Glue Data Catalog.
ConnectionName (string) –
The name of the connection to the Glue Data Catalog.
AdditionalOptions (dict) –
Additional options for the table. Currently there are two keys supported:
pushDownPredicate
: to filter on partitions without having to list and read all the files in your dataset.catalogPartitionPredicate
: to use server-side partition pruning using partition indexes in the Glue Data Catalog.
(string) –
(string) –
Role (string) –
[REQUIRED]
An IAM role supplied to encrypt the results of the run.
NumberOfWorkers (integer) – The number of
G.1X
workers to be used in the run. The default is 5.Timeout (integer) – The timeout for a run in minutes. This is the maximum time that a run can consume resources before it is terminated and enters
TIMEOUT
status. The default is 2,880 minutes (48 hours).ClientToken (string) – Used for idempotency and is recommended to be set to a random ID (such as a UUID) to avoid creating or starting multiple instances of the same resource.
AdditionalRunOptions (dict) –
Additional run options you can specify for an evaluation run.
CloudWatchMetricsEnabled (boolean) –
Whether or not to enable CloudWatch metrics.
ResultsS3Prefix (string) –
Prefix for Amazon S3 to store results.
CompositeRuleEvaluationMethod (string) –
Set the evaluation method for composite rules in the ruleset to ROW/COLUMN
RulesetNames (list) –
[REQUIRED]
A list of ruleset names.
(string) –
AdditionalDataSources (dict) –
A map of reference strings to additional data sources you can specify for an evaluation run.
(string) –
(dict) –
A data source (an Glue table) for which you want data quality results.
GlueTable (dict) – [REQUIRED]
An Glue table.
DatabaseName (string) – [REQUIRED]
A database name in the Glue Data Catalog.
TableName (string) – [REQUIRED]
A table name in the Glue Data Catalog.
CatalogId (string) –
A unique identifier for the Glue Data Catalog.
ConnectionName (string) –
The name of the connection to the Glue Data Catalog.
AdditionalOptions (dict) –
Additional options for the table. Currently there are two keys supported:
pushDownPredicate
: to filter on partitions without having to list and read all the files in your dataset.catalogPartitionPredicate
: to use server-side partition pruning using partition indexes in the Glue Data Catalog.
(string) –
(string) –
- Return type:
dict
- Returns:
Response Syntax
{ 'RunId': 'string' }
Response Structure
(dict) –
RunId (string) –
The unique run identifier associated with this run.
Exceptions