BlueprintSpec defines the desired state of Blueprint, which is the runtime environment which provides the Data Scientist's application with secure and governed access to the data requested in the M4DApplication. The blueprint uses an "argo like" syntax which indicates the components and the flow of data between them as steps TODO: Add an indication of the communication relationships between the components
BlueprintStatus defines the observed state of Blueprint This includes readiness, error message, and indicators forthe Kubernetes resources owned by the Blueprint for cleanup and status monitoring
BlueprintSpec defines the desired state of Blueprint, which is the runtime environment which provides the Data Scientist's application with secure and governed access to the data requested in the M4DApplication. The blueprint uses an "argo like" syntax which indicates the components and the flow of data between them as steps TODO: Add an indication of the communication relationships between the components
DataFlow indicates the flow of the data between the components Currently we assume this is linear and thus use steps, but other more complex graphs could be defined as per how it is done in argo workflow
DataFlow indicates the flow of the data between the components Currently we assume this is linear and thus use steps, but other more complex graphs could be defined as per how it is done in argo workflow
FlowStep is one step indicates an instance of a module in the blueprint, It includes the name of the module template (spec) and the parameters received by the component instance that is initiated by the orchestrator.
Arguments are the input parameters for a specific instance of a module.
false
name
string
Name is the name of the instance of the module. For example, if the application is named "notebook" and an implicitcopy module is deemed necessary. The FlowStep name would be notebook-implicitcopy.
true
template
string
Template is the name of the specification in the Blueprint describing how to instantiate a component indicated by the module. It is the name of a M4DModule CRD. For example: implicit-copy-db2wh-to-s3-latest
ComponentTemplate is a copy of a M4DModule Custom Resource. It contains the information necessary to instantiate a component in a FlowStep, which provides the functionality described by the module. There are 3 different module types.
BlueprintStatus defines the observed state of Blueprint This includes readiness, error message, and indicators forthe Kubernetes resources owned by the Blueprint for cleanup and status monitoring
Name
Type
Description
Required
observedGeneration
integer
ObservedGeneration is taken from the Blueprint metadata. This is used to determine during reconcile whether reconcile was called because the desired state changed, or whether status of the allocated resources should be checked.
ObservedState includes information to be reported back to the M4DApplication resource It includes readiness and error indications, as well as user instructions
false
releases
map[string]integer
Releases map each release to the observed generation of the blueprint containing this release. At the end of reconcile, each release should be mapped to the latest blueprint version or be uninstalled.
ObservedState includes information to be reported back to the M4DApplication resource It includes readiness and error indications, as well as user instructions
Name
Type
Description
Required
dataAccessInstructions
string
DataAccessInstructions indicate how the data user or his application may access the data. Instructions are available upon successful orchestration.
false
error
string
Error indicates that there has been an error to orchestrate the modules and provides the error message
false
ready
boolean
Ready represents that the modules have been orchestrated successfully and the data is ready for usage
M4DApplication provides information about the application being used by a Data Scientist, the nature of the processing, and the data sets that the Data Scientist has chosen for processing by the application. The M4DApplication controller (aka pilot) obtains instructions regarding any governance related changes that must be performed on the data, identifies the modules capable of performing such changes, and finally generates the Blueprint which defines the secure runtime environment and all the components in it. This runtime environment provides the Data Scientist's application with access to the data requested in a secure manner and without having to provide any credentials for the data sets. The credentials are obtained automatically by the manager from an external credential management system, which may or may not be part of a data catalog.
M4DApplicationSpec defines the desired state of M4DApplication.
Name
Type
Description
Required
secretRef
string
SecretRef points to the secret that holds credentials for each system the user has been authenticated with. The secret is deployed in M4dApplication namespace.
Selector enables to connect the resource to the application Application labels should match the labels in the selector. For some flows the selector may not be used.
false
appInfo
map[string]string
AppInfo contains information describing the reasons for the processing that will be done by the Data Scientist's application.
Selector enables to connect the resource to the application Application labels should match the labels in the selector. For some flows the selector may not be used.
matchExpressions is a list of label selector requirements. The requirements are ANDed.
false
matchLabels
map[string]string
matchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels map is equivalent to an element of matchExpressions, whose key field is "key", the operator is "In", and the values array contains only "value". The requirements are ANDed.
A label selector requirement is a selector that contains values, a key, and an operator that relates the key and values.
Name
Type
Description
Required
values
[]string
values is an array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. This array is replaced during a strategic merge patch.
false
key
string
key is the label key that the selector applies to.
true
operator
string
operator represents a key's relationship to a set of values. Valid operators are In, NotIn, Exists and DoesNotExist.
DataContext indicates data set chosen by the Data Scientist to be used by his application, and includes information about the data format and technologies used by the application to access the data.
Name
Type
Description
Required
catalogService
string
CatalogService represents the catalog service for accessing the requested dataset. If not specified, the enterprise catalog service will be used.
false
dataSetID
string
DataSetID is a unique identifier of the dataset chosen from the data catalog for processing by the data user application.
M4DApplicationStatus defines the observed state of M4DApplication.
Name
Type
Description
Required
catalogedAssets
map[string]string
CatalogedAssets provide the new asset identifiers after being registered in the enterprise catalog It maps the original asset id to the cataloged asset id.
ObservedGeneration is taken from the M4DApplication metadata. This is used to determine during reconcile whether reconcile was called because the desired state changed, or whether the Blueprint status changed.
ProvisionedStorage maps a dataset (identified by DataSetID) to the new provisioned bucket. It allows M4DApplication controller to manage buckets in case the spec has been modified, an error has occurred, or a delete event has been received. ProvisionedStorage has the information required to register the dataset once the owned plotter resource is ready
false
ready
boolean
Ready is true if a blueprint has been successfully orchestrated
M4DModule is a description of an injectable component. the parameters it requires, as well as the specification of how to instantiate such a component. It is used as metadata only. There is no status nor reconciliation.
M4DModuleSpec contains the info common to all modules, which are one of the components that process, load, write, audit, monitor the data used by the data scientist's application.
M4DModuleSpec contains the info common to all modules, which are one of the components that process, load, write, audit, monitor the data used by the data scientist's application.
ResourceStatusIndicator is used to determine the status of an orchestrated resource
Name
Type
Description
Required
errorMessage
string
ErrorMessage specifies the resource field to check for an error, e.g. status.errorMsg
false
failureCondition
string
FailureCondition specifies a condition that indicates the resource failure It uses kubernetes label selection syntax (https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/)
false
kind
string
Kind provides information about the resource kind
true
successCondition
string
SuccessCondition specifies a condition that indicates that the resource is ready It uses kubernetes label selection syntax (https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/)
Copy should have one or more instances in the list, and its content should have source and sink Read should have one or more instances in the list, each with source populated Write should have one or more instances in the list, each with sink populated TODO - In the future if we have a module type that doesn't interface directly with data then this list could be empty
M4DStorageAccount defines a storage account used for copying data. Only S3 based storage is supported. It contains endpoint, region and a reference to the credentials a Owner of the asset is responsible to store the credentials
PlotterSpec defines the desired state of Plotter, which is applied in a multi-clustered environment. Plotter installs the runtime environment (as blueprints running on remote clusters) which provides the Data Scientist's application with secure and governed access to the data requested in the M4DApplication.
PlotterStatus defines the observed state of Plotter This includes readiness, error message, and indicators received from blueprint resources owned by the Plotter for cleanup and status monitoring
PlotterSpec defines the desired state of Plotter, which is applied in a multi-clustered environment. Plotter installs the runtime environment (as blueprints running on remote clusters) which provides the Data Scientist's application with secure and governed access to the data requested in the M4DApplication.
BlueprintSpec defines the desired state of Blueprint, which is the runtime environment which provides the Data Scientist's application with secure and governed access to the data requested in the M4DApplication. The blueprint uses an "argo like" syntax which indicates the components and the flow of data between them as steps TODO: Add an indication of the communication relationships between the components
DataFlow indicates the flow of the data between the components Currently we assume this is linear and thus use steps, but other more complex graphs could be defined as per how it is done in argo workflow
DataFlow indicates the flow of the data between the components Currently we assume this is linear and thus use steps, but other more complex graphs could be defined as per how it is done in argo workflow
FlowStep is one step indicates an instance of a module in the blueprint, It includes the name of the module template (spec) and the parameters received by the component instance that is initiated by the orchestrator.
Arguments are the input parameters for a specific instance of a module.
false
name
string
Name is the name of the instance of the module. For example, if the application is named "notebook" and an implicitcopy module is deemed necessary. The FlowStep name would be notebook-implicitcopy.
true
template
string
Template is the name of the specification in the Blueprint describing how to instantiate a component indicated by the module. It is the name of a M4DModule CRD. For example: implicit-copy-db2wh-to-s3-latest
ComponentTemplate is a copy of a M4DModule Custom Resource. It contains the information necessary to instantiate a component in a FlowStep, which provides the functionality described by the module. There are 3 different module types.
matchExpressions is a list of label selector requirements. The requirements are ANDed.
false
matchLabels
map[string]string
matchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels map is equivalent to an element of matchExpressions, whose key field is "key", the operator is "In", and the values array contains only "value". The requirements are ANDed.
A label selector requirement is a selector that contains values, a key, and an operator that relates the key and values.
Name
Type
Description
Required
values
[]string
values is an array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. This array is replaced during a strategic merge patch.
false
key
string
key is the label key that the selector applies to.
true
operator
string
operator represents a key's relationship to a set of values. Valid operators are In, NotIn, Exists and DoesNotExist.
PlotterStatus defines the observed state of Plotter This includes readiness, error message, and indicators received from blueprint resources owned by the Plotter for cleanup and status monitoring
ObservedGeneration is taken from the Plotter metadata. This is used to determine during reconcile whether reconcile was called because the desired state changed, or whether status of the allocated blueprints should be checked.
ObservedState includes information to be reported back to the M4DApplication resource It includes readiness and error indications, as well as user instructions
BlueprintStatus defines the observed state of Blueprint This includes readiness, error message, and indicators forthe Kubernetes resources owned by the Blueprint for cleanup and status monitoring
BlueprintStatus defines the observed state of Blueprint This includes readiness, error message, and indicators forthe Kubernetes resources owned by the Blueprint for cleanup and status monitoring
Name
Type
Description
Required
observedGeneration
integer
ObservedGeneration is taken from the Blueprint metadata. This is used to determine during reconcile whether reconcile was called because the desired state changed, or whether status of the allocated resources should be checked.
ObservedState includes information to be reported back to the M4DApplication resource It includes readiness and error indications, as well as user instructions
false
releases
map[string]integer
Releases map each release to the observed generation of the blueprint containing this release. At the end of reconcile, each release should be mapped to the latest blueprint version or be uninstalled.
ObservedState includes information to be reported back to the M4DApplication resource It includes readiness and error indications, as well as user instructions
Name
Type
Description
Required
dataAccessInstructions
string
DataAccessInstructions indicate how the data user or his application may access the data. Instructions are available upon successful orchestration.
false
error
string
Error indicates that there has been an error to orchestrate the modules and provides the error message
false
ready
boolean
Ready represents that the modules have been orchestrated successfully and the data is ready for usage
ObservedState includes information to be reported back to the M4DApplication resource It includes readiness and error indications, as well as user instructions
Name
Type
Description
Required
dataAccessInstructions
string
DataAccessInstructions indicate how the data user or his application may access the data. Instructions are available upon successful orchestration.
false
error
string
Error indicates that there has been an error to orchestrate the modules and provides the error message
false
ready
boolean
Ready represents that the modules have been orchestrated successfully and the data is ready for usage
Named terms, that exist in Catalog toxonomy and the values for these terms for columns we will have "SchemaDetails" key, that will include technical schema details for this column TODO: Consider create special field for schema outside of metadata
false
tags
[]string
Tags - can be any free text added to a component (no taxonomy)
BatchTransferSpec defines the state of a BatchTransfer. The state includes source/destination specification, a schedule and the means by which data movement is to be conducted. The means is given as a kubernetes job description. In addition, the state also contains a sketch of a transformation instruction. In future releases, the transformation description should be specified in a separate CRD.
BatchTransferStatus defines the observed state of BatchTransfer This includes a reference to the job that implements the movement as well as the last schedule time. What is missing: Extended status information such as: - number of records moved - technical meta-data
BatchTransferSpec defines the state of a BatchTransfer. The state includes source/destination specification, a schedule and the means by which data movement is to be conducted. The means is given as a kubernetes job description. In addition, the state also contains a sketch of a transformation instruction. In future releases, the transformation description should be specified in a separate CRD.
Name
Type
Description
Required
failedJobHistoryLimit
integer
Maximal number of failed Kubernetes job objects that should be kept. This property will be defaulted by the webhook if not set.
false
flowType
enum
Data flow type that specifies if this is a stream or a batch workflow [Batch Stream]
false
image
string
Image that should be used for the actual batch job. This is usually a datamover image. This property will be defaulted by the webhook if not set.
false
imagePullPolicy
string
Image pull policy that should be used for the actual job. This property will be defaulted by the webhook if not set.
false
maxFailedRetries
integer
Maximal number of failed retries until the batch job should stop trying. This property will be defaulted by the webhook if not set.
false
noFinalizer
boolean
If this batch job instance should have a finalizer or not. This property will be defaulted by the webhook if not set.
false
readDataType
enum
Data type of the data that is read from source (log data or change data) [LogData ChangeData]
false
schedule
string
Cron schedule if this BatchTransfer job should run on a regular schedule. Values are specified like cron job schedules. A good translation to human language can be found here https://crontab.guru/
false
secretProviderRole
string
Secret provider role that should be used for the actual job. This property will be defaulted by the webhook if not set.
false
secretProviderURL
string
Secret provider url that should be used for the actual job. This property will be defaulted by the webhook if not set.
Maximal number of successful Kubernetes job objects that should be kept. This property will be defaulted by the webhook if not set.
false
suspend
boolean
If this batch job instance is run on a schedule the regular schedule can be suspended with this property. This property will be defaulted by the webhook if not set.
Transformations to be applied to the source data before writing to destination
false
writeDataType
enum
Data type of how the data should be written to the target (log data or change data) [LogData ChangeData]
false
writeOperation
enum
Write operation that should be performed when writing (overwrite,append,update) Caution: Some write operations are only available for batch and some only for stream. [Overwrite Append Update]
Transformation action that should be performed. [RemoveColumns EncryptColumns DigestColumns RedactColumns SampleRows FilterRows]
false
columns
[]string
Columns that are involved in this action. This property is optional as for some actions no columns have to be specified. E.g. filter is a row based transformation.
false
name
string
Name of the transaction. Mainly used for debugging and lineage tracking.
Database data store. For the moment only Db2 is supported.
false
description
string
Description of the transfer in human readable form that is displayed in the kubectl get If not provided this will be filled in depending on the datastore that is specified.
Kafka data store. The supposed format within the given Kafka topic is a Confluent compatible format stored as Avro. A schema registry needs to be specified as well.
Kafka data store. The supposed format within the given Kafka topic is a Confluent compatible format stored as Avro. A schema registry needs to be specified as well.
Name
Type
Description
Required
createSnapshot
boolean
If a snapshot should be created of the topic. Records in Kafka are stored as key-value pairs. Updates/Deletes for the same key are appended to the Kafka topic and the last value for a given key is the valid key in a Snapshot. When this property is true only the last value will be written. If the property is false all values will be written out. As a CDC example: If the property is true a valid snapshot of the log stream will be created. If the property is false the CDC stream will be dumped as is like a change log.
false
dataFormat
string
Data format of the objects in S3. e.g. parquet or csv. Please refer to struct for allowed values.
false
keyDeserializer
string
Deserializer to be used for the keys of the topic
false
password
string
Kafka user password Can be retrieved from vault if specified in vaultPath parameter and is thus optional.
false
saslMechanism
string
SASL Mechanism to be used (e.g. PLAIN or SCRAM-SHA-512) Default SCRAM-SHA-512 will be assumed if not specified
false
secretImport
string
Define a secret import definition.
false
securityProtocol
string
Kafka security protocol one of (PLAINTEXT, SASL_PLAINTEXT, SASL_SSL, SSL) Default SASL_SSL will be assumed if not specified
false
sslTruststore
string
A truststore or certificate encoded as base64. The format can be JKS or PKCS12. A truststore can be specified like this or in a predefined Kubernetes secret
false
sslTruststoreLocation
string
SSL truststore location.
false
sslTruststorePassword
string
SSL truststore password.
false
sslTruststoreSecret
string
Kubernetes secret that contains the SSL truststore. The format can be JKS or PKCS12. A truststore can be specified like this or as
false
user
string
Kafka user name. Can be retrieved from vault if specified in vaultPath parameter and is thus optional.
false
valueDeserializer
string
Deserializer to be used for the values of the topic
An object store data store that is compatible with S3. This can be a COS bucket.
Name
Type
Description
Required
accessKey
string
Access key of the HMAC credentials that can access the given bucket. Can be retrieved from vault if specified in vaultPath parameter and is thus optional.
false
dataFormat
string
Data format of the objects in S3. e.g. parquet or csv. Please refer to struct for allowed values.
false
partitionBy
[]string
Partition by partition (for target data stores) Defines the columns to partition the output by for a target data store.
false
region
string
Region of S3 service
false
secretImport
string
Define a secret import definition.
false
secretKey
string
Secret key of the HMAC credentials that can access the given bucket. Can be retrieved from vault if specified in vaultPath parameter and is thus optional.
Database data store. For the moment only Db2 is supported.
false
description
string
Description of the transfer in human readable form that is displayed in the kubectl get If not provided this will be filled in depending on the datastore that is specified.
Kafka data store. The supposed format within the given Kafka topic is a Confluent compatible format stored as Avro. A schema registry needs to be specified as well.
Kafka data store. The supposed format within the given Kafka topic is a Confluent compatible format stored as Avro. A schema registry needs to be specified as well.
Name
Type
Description
Required
createSnapshot
boolean
If a snapshot should be created of the topic. Records in Kafka are stored as key-value pairs. Updates/Deletes for the same key are appended to the Kafka topic and the last value for a given key is the valid key in a Snapshot. When this property is true only the last value will be written. If the property is false all values will be written out. As a CDC example: If the property is true a valid snapshot of the log stream will be created. If the property is false the CDC stream will be dumped as is like a change log.
false
dataFormat
string
Data format of the objects in S3. e.g. parquet or csv. Please refer to struct for allowed values.
false
keyDeserializer
string
Deserializer to be used for the keys of the topic
false
password
string
Kafka user password Can be retrieved from vault if specified in vaultPath parameter and is thus optional.
false
saslMechanism
string
SASL Mechanism to be used (e.g. PLAIN or SCRAM-SHA-512) Default SCRAM-SHA-512 will be assumed if not specified
false
secretImport
string
Define a secret import definition.
false
securityProtocol
string
Kafka security protocol one of (PLAINTEXT, SASL_PLAINTEXT, SASL_SSL, SSL) Default SASL_SSL will be assumed if not specified
false
sslTruststore
string
A truststore or certificate encoded as base64. The format can be JKS or PKCS12. A truststore can be specified like this or in a predefined Kubernetes secret
false
sslTruststoreLocation
string
SSL truststore location.
false
sslTruststorePassword
string
SSL truststore password.
false
sslTruststoreSecret
string
Kubernetes secret that contains the SSL truststore. The format can be JKS or PKCS12. A truststore can be specified like this or as
false
user
string
Kafka user name. Can be retrieved from vault if specified in vaultPath parameter and is thus optional.
false
valueDeserializer
string
Deserializer to be used for the values of the topic
An object store data store that is compatible with S3. This can be a COS bucket.
Name
Type
Description
Required
accessKey
string
Access key of the HMAC credentials that can access the given bucket. Can be retrieved from vault if specified in vaultPath parameter and is thus optional.
false
dataFormat
string
Data format of the objects in S3. e.g. parquet or csv. Please refer to struct for allowed values.
false
partitionBy
[]string
Partition by partition (for target data stores) Defines the columns to partition the output by for a target data store.
false
region
string
Region of S3 service
false
secretImport
string
Define a secret import definition.
false
secretKey
string
Secret key of the HMAC credentials that can access the given bucket. Can be retrieved from vault if specified in vaultPath parameter and is thus optional.
BatchTransferStatus defines the observed state of BatchTransfer This includes a reference to the job that implements the movement as well as the last schedule time. What is missing: Extended status information such as: - number of records moved - technical meta-data
ObjectReference contains enough information to let you inspect or modify the referred object. --- New uses of this type are discouraged because of difficulty describing its usage when embedded in APIs. 1. Ignored fields. It includes many fields which are not generally honored. For instance, ResourceVersion and FieldPath are both very rarely valid in actual usage. 2. Invalid usage help. It is impossible to add specific help for individual usage. In most embedded usages, there are particular restrictions like, "must refer only to types A and B" or "UID not honored" or "name must be restricted". Those cannot be well described when embedded. 3. Inconsistent validation. Because the usages are different, the validation rules are different by usage, which makes it hard for users to predict what will happen. 4. The fields are both imprecise and overly precise. Kind is not a precise mapping to a URL. This can produce ambiguity during interpretation and require a REST mapping. In most cases, the dependency is on the group,resource tuple and the version of the actual struct is irrelevant. 5. We cannot easily change it. Because this type is embedded in many locations, updates to this type will affect numerous schemas. Don't make new APIs embed an underspecified API type they do not control. Instead of using this type, create a locally provided and used type that is well-focused on your reference. For example, ServiceReferences for admission registration: https://github.com/kubernetes/api/blob/release-1.17/admissionregistration/v1/types.go#L533 .
ObjectReference contains enough information to let you inspect or modify the referred object. --- New uses of this type are discouraged because of difficulty describing its usage when embedded in APIs. 1. Ignored fields. It includes many fields which are not generally honored. For instance, ResourceVersion and FieldPath are both very rarely valid in actual usage. 2. Invalid usage help. It is impossible to add specific help for individual usage. In most embedded usages, there are particular restrictions like, "must refer only to types A and B" or "UID not honored" or "name must be restricted". Those cannot be well described when embedded. 3. Inconsistent validation. Because the usages are different, the validation rules are different by usage, which makes it hard for users to predict what will happen. 4. The fields are both imprecise and overly precise. Kind is not a precise mapping to a URL. This can produce ambiguity during interpretation and require a REST mapping. In most cases, the dependency is on the group,resource tuple and the version of the actual struct is irrelevant. 5. We cannot easily change it. Because this type is embedded in many locations, updates to this type will affect numerous schemas. Don't make new APIs embed an underspecified API type they do not control. Instead of using this type, create a locally provided and used type that is well-focused on your reference. For example, ServiceReferences for admission registration: https://github.com/kubernetes/api/blob/release-1.17/admissionregistration/v1/types.go#L533 .
false
lastRecordTime
string
false
lastScheduleTime
string
Information when was the last time the job was successfully scheduled.
If referring to a piece of an object instead of an entire object, this string should contain a valid JSON/Go field access statement, such as desiredState.manifest.containers[2]. For example, if the object reference is to a container within a pod, this would take on a value like: "spec.containers{name}" (where "name" refers to the name of the container that triggered the event) or if no container name is specified "spec.containers[2]" (container with index 2 in this pod). This syntax is chosen only to have some well-defined way of referencing a part of an object. TODO: this design is not final and this field is subject to change in the future.
false
kind
string
Kind of the referent. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
false
name
string
Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
false
namespace
string
Namespace of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/
false
resourceVersion
string
Specific resourceVersion to which this reference is made, if any. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#concurrency-control-and-consistency
false
uid
string
UID of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#uids
ObjectReference contains enough information to let you inspect or modify the referred object. --- New uses of this type are discouraged because of difficulty describing its usage when embedded in APIs. 1. Ignored fields. It includes many fields which are not generally honored. For instance, ResourceVersion and FieldPath are both very rarely valid in actual usage. 2. Invalid usage help. It is impossible to add specific help for individual usage. In most embedded usages, there are particular restrictions like, "must refer only to types A and B" or "UID not honored" or "name must be restricted". Those cannot be well described when embedded. 3. Inconsistent validation. Because the usages are different, the validation rules are different by usage, which makes it hard for users to predict what will happen. 4. The fields are both imprecise and overly precise. Kind is not a precise mapping to a URL. This can produce ambiguity during interpretation and require a REST mapping. In most cases, the dependency is on the group,resource tuple and the version of the actual struct is irrelevant. 5. We cannot easily change it. Because this type is embedded in many locations, updates to this type will affect numerous schemas. Don't make new APIs embed an underspecified API type they do not control. Instead of using this type, create a locally provided and used type that is well-focused on your reference. For example, ServiceReferences for admission registration: https://github.com/kubernetes/api/blob/release-1.17/admissionregistration/v1/types.go#L533 .
Name
Type
Description
Required
apiVersion
string
API version of the referent.
false
fieldPath
string
If referring to a piece of an object instead of an entire object, this string should contain a valid JSON/Go field access statement, such as desiredState.manifest.containers[2]. For example, if the object reference is to a container within a pod, this would take on a value like: "spec.containers{name}" (where "name" refers to the name of the container that triggered the event) or if no container name is specified "spec.containers[2]" (container with index 2 in this pod). This syntax is chosen only to have some well-defined way of referencing a part of an object. TODO: this design is not final and this field is subject to change in the future.
false
kind
string
Kind of the referent. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
false
name
string
Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
false
namespace
string
Namespace of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/
false
resourceVersion
string
Specific resourceVersion to which this reference is made, if any. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#concurrency-control-and-consistency
false
uid
string
UID of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#uids
ObjectReference contains enough information to let you inspect or modify the referred object. --- New uses of this type are discouraged because of difficulty describing its usage when embedded in APIs. 1. Ignored fields. It includes many fields which are not generally honored. For instance, ResourceVersion and FieldPath are both very rarely valid in actual usage. 2. Invalid usage help. It is impossible to add specific help for individual usage. In most embedded usages, there are particular restrictions like, "must refer only to types A and B" or "UID not honored" or "name must be restricted". Those cannot be well described when embedded. 3. Inconsistent validation. Because the usages are different, the validation rules are different by usage, which makes it hard for users to predict what will happen. 4. The fields are both imprecise and overly precise. Kind is not a precise mapping to a URL. This can produce ambiguity during interpretation and require a REST mapping. In most cases, the dependency is on the group,resource tuple and the version of the actual struct is irrelevant. 5. We cannot easily change it. Because this type is embedded in many locations, updates to this type will affect numerous schemas. Don't make new APIs embed an underspecified API type they do not control. Instead of using this type, create a locally provided and used type that is well-focused on your reference. For example, ServiceReferences for admission registration: https://github.com/kubernetes/api/blob/release-1.17/admissionregistration/v1/types.go#L533 .
Name
Type
Description
Required
apiVersion
string
API version of the referent.
false
fieldPath
string
If referring to a piece of an object instead of an entire object, this string should contain a valid JSON/Go field access statement, such as desiredState.manifest.containers[2]. For example, if the object reference is to a container within a pod, this would take on a value like: "spec.containers{name}" (where "name" refers to the name of the container that triggered the event) or if no container name is specified "spec.containers[2]" (container with index 2 in this pod). This syntax is chosen only to have some well-defined way of referencing a part of an object. TODO: this design is not final and this field is subject to change in the future.
false
kind
string
Kind of the referent. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
false
name
string
Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
false
namespace
string
Namespace of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/
false
resourceVersion
string
Specific resourceVersion to which this reference is made, if any. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#concurrency-control-and-consistency
false
uid
string
UID of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#uids
StreamTransferSpec defines the desired state of StreamTransfer
Name
Type
Description
Required
flowType
enum
Data flow type that specifies if this is a stream or a batch workflow [Batch Stream]
false
image
string
Image that should be used for the actual batch job. This is usually a datamover image. This property will be defaulted by the webhook if not set.
false
imagePullPolicy
string
Image pull policy that should be used for the actual job. This property will be defaulted by the webhook if not set.
false
noFinalizer
boolean
If this batch job instance should have a finalizer or not. This property will be defaulted by the webhook if not set.
false
readDataType
enum
Data type of the data that is read from source (log data or change data) [LogData ChangeData]
false
secretProviderRole
string
Secret provider role that should be used for the actual job. This property will be defaulted by the webhook if not set.
false
secretProviderURL
string
Secret provider url that should be used for the actual job. This property will be defaulted by the webhook if not set.
false
suspend
boolean
If this batch job instance is run on a schedule the regular schedule can be suspended with this property. This property will be defaulted by the webhook if not set.
Transformations to be applied to the source data before writing to destination
false
triggerInterval
string
Interval in which the Micro batches of this stream should be triggered The default is '5 seconds'.
false
writeDataType
enum
Data type of how the data should be written to the target (log data or change data) [LogData ChangeData]
false
writeOperation
enum
Write operation that should be performed when writing (overwrite,append,update) Caution: Some write operations are only available for batch and some only for stream. [Overwrite Append Update]
Transformation action that should be performed. [RemoveColumns EncryptColumns DigestColumns RedactColumns SampleRows FilterRows]
false
columns
[]string
Columns that are involved in this action. This property is optional as for some actions no columns have to be specified. E.g. filter is a row based transformation.
false
name
string
Name of the transaction. Mainly used for debugging and lineage tracking.
Database data store. For the moment only Db2 is supported.
false
description
string
Description of the transfer in human readable form that is displayed in the kubectl get If not provided this will be filled in depending on the datastore that is specified.
Kafka data store. The supposed format within the given Kafka topic is a Confluent compatible format stored as Avro. A schema registry needs to be specified as well.
Kafka data store. The supposed format within the given Kafka topic is a Confluent compatible format stored as Avro. A schema registry needs to be specified as well.
Name
Type
Description
Required
createSnapshot
boolean
If a snapshot should be created of the topic. Records in Kafka are stored as key-value pairs. Updates/Deletes for the same key are appended to the Kafka topic and the last value for a given key is the valid key in a Snapshot. When this property is true only the last value will be written. If the property is false all values will be written out. As a CDC example: If the property is true a valid snapshot of the log stream will be created. If the property is false the CDC stream will be dumped as is like a change log.
false
dataFormat
string
Data format of the objects in S3. e.g. parquet or csv. Please refer to struct for allowed values.
false
keyDeserializer
string
Deserializer to be used for the keys of the topic
false
password
string
Kafka user password Can be retrieved from vault if specified in vaultPath parameter and is thus optional.
false
saslMechanism
string
SASL Mechanism to be used (e.g. PLAIN or SCRAM-SHA-512) Default SCRAM-SHA-512 will be assumed if not specified
false
secretImport
string
Define a secret import definition.
false
securityProtocol
string
Kafka security protocol one of (PLAINTEXT, SASL_PLAINTEXT, SASL_SSL, SSL) Default SASL_SSL will be assumed if not specified
false
sslTruststore
string
A truststore or certificate encoded as base64. The format can be JKS or PKCS12. A truststore can be specified like this or in a predefined Kubernetes secret
false
sslTruststoreLocation
string
SSL truststore location.
false
sslTruststorePassword
string
SSL truststore password.
false
sslTruststoreSecret
string
Kubernetes secret that contains the SSL truststore. The format can be JKS or PKCS12. A truststore can be specified like this or as
false
user
string
Kafka user name. Can be retrieved from vault if specified in vaultPath parameter and is thus optional.
false
valueDeserializer
string
Deserializer to be used for the values of the topic
An object store data store that is compatible with S3. This can be a COS bucket.
Name
Type
Description
Required
accessKey
string
Access key of the HMAC credentials that can access the given bucket. Can be retrieved from vault if specified in vaultPath parameter and is thus optional.
false
dataFormat
string
Data format of the objects in S3. e.g. parquet or csv. Please refer to struct for allowed values.
false
partitionBy
[]string
Partition by partition (for target data stores) Defines the columns to partition the output by for a target data store.
false
region
string
Region of S3 service
false
secretImport
string
Define a secret import definition.
false
secretKey
string
Secret key of the HMAC credentials that can access the given bucket. Can be retrieved from vault if specified in vaultPath parameter and is thus optional.
Database data store. For the moment only Db2 is supported.
false
description
string
Description of the transfer in human readable form that is displayed in the kubectl get If not provided this will be filled in depending on the datastore that is specified.
Kafka data store. The supposed format within the given Kafka topic is a Confluent compatible format stored as Avro. A schema registry needs to be specified as well.
Kafka data store. The supposed format within the given Kafka topic is a Confluent compatible format stored as Avro. A schema registry needs to be specified as well.
Name
Type
Description
Required
createSnapshot
boolean
If a snapshot should be created of the topic. Records in Kafka are stored as key-value pairs. Updates/Deletes for the same key are appended to the Kafka topic and the last value for a given key is the valid key in a Snapshot. When this property is true only the last value will be written. If the property is false all values will be written out. As a CDC example: If the property is true a valid snapshot of the log stream will be created. If the property is false the CDC stream will be dumped as is like a change log.
false
dataFormat
string
Data format of the objects in S3. e.g. parquet or csv. Please refer to struct for allowed values.
false
keyDeserializer
string
Deserializer to be used for the keys of the topic
false
password
string
Kafka user password Can be retrieved from vault if specified in vaultPath parameter and is thus optional.
false
saslMechanism
string
SASL Mechanism to be used (e.g. PLAIN or SCRAM-SHA-512) Default SCRAM-SHA-512 will be assumed if not specified
false
secretImport
string
Define a secret import definition.
false
securityProtocol
string
Kafka security protocol one of (PLAINTEXT, SASL_PLAINTEXT, SASL_SSL, SSL) Default SASL_SSL will be assumed if not specified
false
sslTruststore
string
A truststore or certificate encoded as base64. The format can be JKS or PKCS12. A truststore can be specified like this or in a predefined Kubernetes secret
false
sslTruststoreLocation
string
SSL truststore location.
false
sslTruststorePassword
string
SSL truststore password.
false
sslTruststoreSecret
string
Kubernetes secret that contains the SSL truststore. The format can be JKS or PKCS12. A truststore can be specified like this or as
false
user
string
Kafka user name. Can be retrieved from vault if specified in vaultPath parameter and is thus optional.
false
valueDeserializer
string
Deserializer to be used for the values of the topic
An object store data store that is compatible with S3. This can be a COS bucket.
Name
Type
Description
Required
accessKey
string
Access key of the HMAC credentials that can access the given bucket. Can be retrieved from vault if specified in vaultPath parameter and is thus optional.
false
dataFormat
string
Data format of the objects in S3. e.g. parquet or csv. Please refer to struct for allowed values.
false
partitionBy
[]string
Partition by partition (for target data stores) Defines the columns to partition the output by for a target data store.
false
region
string
Region of S3 service
false
secretImport
string
Define a secret import definition.
false
secretKey
string
Secret key of the HMAC credentials that can access the given bucket. Can be retrieved from vault if specified in vaultPath parameter and is thus optional.
If referring to a piece of an object instead of an entire object, this string should contain a valid JSON/Go field access statement, such as desiredState.manifest.containers[2]. For example, if the object reference is to a container within a pod, this would take on a value like: "spec.containers{name}" (where "name" refers to the name of the container that triggered the event) or if no container name is specified "spec.containers[2]" (container with index 2 in this pod). This syntax is chosen only to have some well-defined way of referencing a part of an object. TODO: this design is not final and this field is subject to change in the future.
false
kind
string
Kind of the referent. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
false
name
string
Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
false
namespace
string
Namespace of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/
false
resourceVersion
string
Specific resourceVersion to which this reference is made, if any. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#concurrency-control-and-consistency
false
uid
string
UID of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#uids