TensorFlow Transform tft.beam
Module¶
tensorflow_transform.beam
¶
Module level imports for tensorflow_transform.beam.
Classes¶
AnalyzeAndTransformDataset
¶
Bases: PTransform
Combination of AnalyzeDataset and TransformDataset.
should be equivalent to
transform_fn = AnalyzeDataset(preprocessing_fn).expand(dataset)
transformed = TransformDataset().expand((dataset, transform_fn))
but may be more efficient since it avoids multiple passes over the data.
Init method.
PARAMETER | DESCRIPTION |
---|---|
preprocessing_fn
|
A function that accepts and returns a dictionary from
strings to
|
output_record_batches
|
(Optional) A bool. If
DEFAULT:
|
Source code in tensorflow_transform/beam/impl.py
Attributes¶
Functions¶
annotations
¶
default_label
¶
default_type_hints
¶
Source code in apache_beam/transforms/ptransform.py
display_data
¶
Returns the display data associated to a pipeline component.
It should be reimplemented in pipeline components that wish to have static display data.
RETURNS | DESCRIPTION |
---|---|
Dict[str, Any]: A dictionary containing |
|
The value might be an integer, float or string value; a |
|
class: |
|
(e.g. short value, label, url); or a :class: |
|
that has more display data that should be picked up. For example:: { 'key1': 'string_value', 'key2': 1234, 'key3': 3.14159265, 'key4': DisplayDataItem('apache.org', url='http://apache.org'), 'key5': subComponent } |
Source code in apache_beam/transforms/display.py
expand
¶
Transform the dataset by applying the preprocessing_fn.
PARAMETER | DESCRIPTION |
---|---|
dataset
|
A dataset.
|
RETURNS | DESCRIPTION |
---|---|
A (Dataset, TransformFn) pair containing the preprocessed dataset and |
|
the graph that maps the input to the output data. |
Source code in tensorflow_transform/beam/impl.py
from_runner_api
classmethod
¶
Source code in apache_beam/transforms/ptransform.py
get_resource_hints
¶
Source code in apache_beam/transforms/ptransform.py
get_type_hints
¶
Gets and/or initializes type hints for this object.
If type hints have not been set, attempts to initialize type hints in this order: - Using self.default_type_hints(). - Using self.class type hints.
Source code in apache_beam/typehints/decorators.py
get_windowing
¶
Returns the window function to be associated with transform's output.
By default most transforms just return the windowing function associated with the input PCollection (or the first input if several).
Source code in apache_beam/transforms/ptransform.py
infer_output_type
¶
register_urn
classmethod
¶
Source code in apache_beam/transforms/ptransform.py
runner_api_requires_keyed_input
¶
to_runner_api
¶
Source code in apache_beam/transforms/ptransform.py
to_runner_api_parameter
¶
Source code in apache_beam/transforms/ptransform.py
to_runner_api_pickled
¶
Source code in apache_beam/transforms/ptransform.py
type_check_inputs
¶
type_check_inputs_or_outputs
¶
Source code in apache_beam/transforms/ptransform.py
type_check_outputs
¶
with_input_types
¶
Annotates the input type of a :class:PTransform
with a type-hint.
PARAMETER | DESCRIPTION |
---|---|
input_type_hint
|
An instance of an allowed built-in type, a custom
class, or an instance of a
:class:
TYPE:
|
RAISES | DESCRIPTION |
---|---|
TypeError
|
If input_type_hint is not a valid type-hint.
See
:obj: |
RETURNS | DESCRIPTION |
---|---|
PTransform
|
A reference to the instance of this particular |
class: |
|
methods. |
Source code in apache_beam/transforms/ptransform.py
with_output_types
¶
Annotates the output type of a :class:PTransform
with a type-hint.
PARAMETER | DESCRIPTION |
---|---|
type_hint
|
An instance of an allowed built-in type, a custom class,
or a :class:
TYPE:
|
RAISES | DESCRIPTION |
---|---|
TypeError
|
If type_hint is not a valid type-hint. See
:obj: |
RETURNS | DESCRIPTION |
---|---|
PTransform
|
A reference to the instance of this particular |
class: |
|
methods. |
Source code in apache_beam/transforms/ptransform.py
with_resource_hints
¶
Adds resource hints to the :class:PTransform
.
Resource hints allow users to express constraints on the environment where the transform should be executed. Interpretation of the resource hints is defined by Beam Runners. Runners may ignore the unsupported hints.
PARAMETER | DESCRIPTION |
---|---|
**kwargs
|
key-value pairs describing hints and their values.
DEFAULT:
|
RAISES | DESCRIPTION |
---|---|
ValueError
|
if provided hints are unknown to the SDK. See
:mod: |
RETURNS | DESCRIPTION |
---|---|
PTransform
|
A reference to the instance of this particular |
class: |
Source code in apache_beam/transforms/ptransform.py
AnalyzeDataset
¶
Bases: _AnalyzeDatasetCommon
Takes a preprocessing_fn and computes the relevant statistics.
AnalyzeDataset accepts a preprocessing_fn in its constructor. When its
expand
method is called on a dataset, it computes all the relevant
statistics required to run the transformation described by the
preprocessing_fn, and returns a TransformFn representing the application of
the preprocessing_fn.
Init method.
PARAMETER | DESCRIPTION |
---|---|
preprocessing_fn
|
A function that accepts and returns a dictionary from
strings to
|
pipeline
|
(Optional) a beam Pipeline.
DEFAULT:
|
Source code in tensorflow_transform/beam/impl.py
Attributes¶
Functions¶
annotations
¶
default_label
¶
default_type_hints
¶
Source code in apache_beam/transforms/ptransform.py
display_data
¶
Returns the display data associated to a pipeline component.
It should be reimplemented in pipeline components that wish to have static display data.
RETURNS | DESCRIPTION |
---|---|
Dict[str, Any]: A dictionary containing |
|
The value might be an integer, float or string value; a |
|
class: |
|
(e.g. short value, label, url); or a :class: |
|
that has more display data that should be picked up. For example:: { 'key1': 'string_value', 'key2': 1234, 'key3': 3.14159265, 'key4': DisplayDataItem('apache.org', url='http://apache.org'), 'key5': subComponent } |
Source code in apache_beam/transforms/display.py
expand
¶
Analyze the dataset.
PARAMETER | DESCRIPTION |
---|---|
dataset
|
A dataset.
|
RETURNS | DESCRIPTION |
---|---|
A TransformFn containing the deferred transform function. |
RAISES | DESCRIPTION |
---|---|
ValueError
|
If preprocessing_fn has no outputs. |
from_runner_api
classmethod
¶
Source code in apache_beam/transforms/ptransform.py
get_resource_hints
¶
Source code in apache_beam/transforms/ptransform.py
get_type_hints
¶
Gets and/or initializes type hints for this object.
If type hints have not been set, attempts to initialize type hints in this order: - Using self.default_type_hints(). - Using self.class type hints.
Source code in apache_beam/typehints/decorators.py
get_windowing
¶
Returns the window function to be associated with transform's output.
By default most transforms just return the windowing function associated with the input PCollection (or the first input if several).
Source code in apache_beam/transforms/ptransform.py
infer_output_type
¶
register_urn
classmethod
¶
Source code in apache_beam/transforms/ptransform.py
runner_api_requires_keyed_input
¶
to_runner_api
¶
Source code in apache_beam/transforms/ptransform.py
to_runner_api_parameter
¶
Source code in apache_beam/transforms/ptransform.py
to_runner_api_pickled
¶
Source code in apache_beam/transforms/ptransform.py
type_check_inputs
¶
type_check_inputs_or_outputs
¶
Source code in apache_beam/transforms/ptransform.py
type_check_outputs
¶
with_input_types
¶
Annotates the input type of a :class:PTransform
with a type-hint.
PARAMETER | DESCRIPTION |
---|---|
input_type_hint
|
An instance of an allowed built-in type, a custom
class, or an instance of a
:class:
TYPE:
|
RAISES | DESCRIPTION |
---|---|
TypeError
|
If input_type_hint is not a valid type-hint.
See
:obj: |
RETURNS | DESCRIPTION |
---|---|
PTransform
|
A reference to the instance of this particular |
class: |
|
methods. |
Source code in apache_beam/transforms/ptransform.py
with_output_types
¶
Annotates the output type of a :class:PTransform
with a type-hint.
PARAMETER | DESCRIPTION |
---|---|
type_hint
|
An instance of an allowed built-in type, a custom class,
or a :class:
TYPE:
|
RAISES | DESCRIPTION |
---|---|
TypeError
|
If type_hint is not a valid type-hint. See
:obj: |
RETURNS | DESCRIPTION |
---|---|
PTransform
|
A reference to the instance of this particular |
class: |
|
methods. |
Source code in apache_beam/transforms/ptransform.py
with_resource_hints
¶
Adds resource hints to the :class:PTransform
.
Resource hints allow users to express constraints on the environment where the transform should be executed. Interpretation of the resource hints is defined by Beam Runners. Runners may ignore the unsupported hints.
PARAMETER | DESCRIPTION |
---|---|
**kwargs
|
key-value pairs describing hints and their values.
DEFAULT:
|
RAISES | DESCRIPTION |
---|---|
ValueError
|
if provided hints are unknown to the SDK. See
:mod: |
RETURNS | DESCRIPTION |
---|---|
PTransform
|
A reference to the instance of this particular |
class: |
Source code in apache_beam/transforms/ptransform.py
AnalyzeDatasetWithCache
¶
Bases: _AnalyzeDatasetCommon
Takes a preprocessing_fn and computes the relevant statistics.
WARNING: This is experimental.
Operates similarly to AnalyzeDataset, by computing the required statistics except this will not re-compute statistics when they are already cached, and will write out cache for statistics that it does compute whenever possible.
Example use:
span_0_key = tft_beam.analyzer_cache.DatasetKey('span-0') cache_dir = tempfile.mkdtemp() output_path = os.path.join(tempfile.mkdtemp(), 'result') def preprocessing_fn(inputs): ... x = inputs['x'] ... return {'x_mean': tft.mean(x, name='x') + tf.zeros_like(x)} feature_spec = {'x': tf.io.FixedLenFeature([], tf.float32)} input_metadata = tft.DatasetMetadata.from_feature_spec(feature_spec) input_data_dict_0 = {span_0_key: [{'x': x} for x in range(6)]} input_data_dict_1 = {span_0_key: [{'x': x} for x in range(6, 11)]} empty_input_cache = {} with tft_beam.Context(temp_dir=tempfile.mkdtemp()): ... with beam.Pipeline() as p: ... # Iteration #0: ... transform_fn, output_cache = ( ... (input_data_dict_0, empty_input_cache, input_metadata) ... | tft_beam.AnalyzeDatasetWithCache(preprocessing_fn)) ... output_cache | tft_beam.analyzer_cache.WriteAnalysisCacheToFS( ... p, cache_dir) ... ... # Iteration #1: ... input_cache = p | tft_beam.analyzer_cache.ReadAnalysisCacheFromFS( ... cache_dir, [span_0_key]) ... transform_fn, output_cache = ( ... (input_data_dict_1, input_cache, input_metadata) ... | tft_beam.AnalyzeDatasetWithCache(preprocessing_fn)) ... output_cache | tft_beam.analyzer_cache.WriteAnalysisCacheToFS( ... p, cache_dir) ... ... # Applying the accumulated transformation: ... transform_data = p | beam.Create(input_data_dict_0[span_0_key]) ... transformed_dataset = ( ... ((transform_data, input_metadata), transform_fn) ... | tft_beam.TransformDataset()) ... transformed_data, transformed_metadata = transformed_dataset ... (transformed_data ... | beam.combiners.Sample.FixedSizeGlobally(1) ... | beam.io.WriteToText(output_path, shard_name_template='')) with open(output_path) as f: ... f.read()
"[{'x_mean': 5.0}]\n"
Init method.
PARAMETER | DESCRIPTION |
---|---|
preprocessing_fn
|
A function that accepts and returns a dictionary from
strings to
|
pipeline
|
(Optional) a beam Pipeline.
DEFAULT:
|
Source code in tensorflow_transform/beam/impl.py
Attributes¶
Functions¶
annotations
¶
default_label
¶
default_type_hints
¶
Source code in apache_beam/transforms/ptransform.py
display_data
¶
Returns the display data associated to a pipeline component.
It should be reimplemented in pipeline components that wish to have static display data.
RETURNS | DESCRIPTION |
---|---|
Dict[str, Any]: A dictionary containing |
|
The value might be an integer, float or string value; a |
|
class: |
|
(e.g. short value, label, url); or a :class: |
|
that has more display data that should be picked up. For example:: { 'key1': 'string_value', 'key2': 1234, 'key3': 3.14159265, 'key4': DisplayDataItem('apache.org', url='http://apache.org'), 'key5': subComponent } |
Source code in apache_beam/transforms/display.py
expand
¶
Analyze the dataset.
PARAMETER | DESCRIPTION |
---|---|
dataset
|
A dataset.
|
RETURNS | DESCRIPTION |
---|---|
A TransformFn containing the deferred transform function. |
RAISES | DESCRIPTION |
---|---|
ValueError
|
If preprocessing_fn has no outputs. |
from_runner_api
classmethod
¶
Source code in apache_beam/transforms/ptransform.py
get_resource_hints
¶
Source code in apache_beam/transforms/ptransform.py
get_type_hints
¶
Gets and/or initializes type hints for this object.
If type hints have not been set, attempts to initialize type hints in this order: - Using self.default_type_hints(). - Using self.class type hints.
Source code in apache_beam/typehints/decorators.py
get_windowing
¶
Returns the window function to be associated with transform's output.
By default most transforms just return the windowing function associated with the input PCollection (or the first input if several).
Source code in apache_beam/transforms/ptransform.py
infer_output_type
¶
register_urn
classmethod
¶
Source code in apache_beam/transforms/ptransform.py
runner_api_requires_keyed_input
¶
to_runner_api
¶
Source code in apache_beam/transforms/ptransform.py
to_runner_api_parameter
¶
Source code in apache_beam/transforms/ptransform.py
to_runner_api_pickled
¶
Source code in apache_beam/transforms/ptransform.py
type_check_inputs
¶
type_check_inputs_or_outputs
¶
Source code in apache_beam/transforms/ptransform.py
type_check_outputs
¶
with_input_types
¶
Annotates the input type of a :class:PTransform
with a type-hint.
PARAMETER | DESCRIPTION |
---|---|
input_type_hint
|
An instance of an allowed built-in type, a custom
class, or an instance of a
:class:
TYPE:
|
RAISES | DESCRIPTION |
---|---|
TypeError
|
If input_type_hint is not a valid type-hint.
See
:obj: |
RETURNS | DESCRIPTION |
---|---|
PTransform
|
A reference to the instance of this particular |
class: |
|
methods. |
Source code in apache_beam/transforms/ptransform.py
with_output_types
¶
Annotates the output type of a :class:PTransform
with a type-hint.
PARAMETER | DESCRIPTION |
---|---|
type_hint
|
An instance of an allowed built-in type, a custom class,
or a :class:
TYPE:
|
RAISES | DESCRIPTION |
---|---|
TypeError
|
If type_hint is not a valid type-hint. See
:obj: |
RETURNS | DESCRIPTION |
---|---|
PTransform
|
A reference to the instance of this particular |
class: |
|
methods. |
Source code in apache_beam/transforms/ptransform.py
with_resource_hints
¶
Adds resource hints to the :class:PTransform
.
Resource hints allow users to express constraints on the environment where the transform should be executed. Interpretation of the resource hints is defined by Beam Runners. Runners may ignore the unsupported hints.
PARAMETER | DESCRIPTION |
---|---|
**kwargs
|
key-value pairs describing hints and their values.
DEFAULT:
|
RAISES | DESCRIPTION |
---|---|
ValueError
|
if provided hints are unknown to the SDK. See
:mod: |
RETURNS | DESCRIPTION |
---|---|
PTransform
|
A reference to the instance of this particular |
class: |
Source code in apache_beam/transforms/ptransform.py
Context
¶
Context(
temp_dir: Optional[str] = None,
desired_batch_size: Optional[int] = None,
passthrough_keys: Optional[Iterable[str]] = None,
use_deep_copy_optimization: Optional[bool] = None,
force_tf_compat_v1: Optional[bool] = None,
save_options: Optional[SaveOptions] = None,
)
Context manager for tensorflow-transform.
All the attributes in this context are kept on a thread local state.
ATTRIBUTE | DESCRIPTION |
---|---|
temp_dir |
(Optional) The temporary directory used within in this block.
|
desired_batch_size |
(Optional) A batch size to batch elements by. If not provided, a batch size will be computed automatically.
|
passthrough_keys |
(Optional) A set of strings that are keys to instances that should pass through the pipeline and be hidden from the preprocessing_fn. This should only be used in cases where additional information should be attached to instances in the pipeline which should not be part of the transformation graph, instance keys is one such example.
|
use_deep_copy_optimization |
(Optional) If True, makes deep copies of PCollections that are used in multiple TFT phases.
|
force_tf_compat_v1 |
(Optional) If True, TFT's public APIs
(e.g. AnalyzeDataset) will use Tensorflow in compat.v1 mode irrespective
of installed version of Tensorflow. Defaults to
|
save_options |
(Optional) If set, the tf.saved_model.SaveOptions to save the transform_fn with. Only applies for TF2.
|
Note that the temp dir should be accessible to worker jobs, e.g. if running with the Cloud Dataflow runner, the temp dir should be on GCS and should have permissions that allow both launcher and workers to access it.
Source code in tensorflow_transform/beam/context.py
Functions¶
create_base_temp_dir
classmethod
¶
create_base_temp_dir() -> str
Generate a temporary location.
Source code in tensorflow_transform/beam/context.py
get_desired_batch_size
classmethod
¶
Retrieves a user set fixed batch size, None if not set.
Source code in tensorflow_transform/beam/context.py
get_passthrough_keys
classmethod
¶
Retrieves a user set passthrough_keys, None if not set.
Source code in tensorflow_transform/beam/context.py
get_save_options
classmethod
¶
get_save_options() -> Optional[SaveOptions]
Retrieves a user set save_options, None if not set.
Source code in tensorflow_transform/beam/context.py
get_use_deep_copy_optimization
classmethod
¶
get_use_deep_copy_optimization() -> bool
Retrieves a user set use_deep_copy_optimization, None if not set.
Source code in tensorflow_transform/beam/context.py
EncodeTransformedDataset
¶
Bases: PTransform
Encodes transformed data into serialized tf.Examples.
Should operate on the output of TransformDataset
, this can operate on either
record batch or instance dict data.
The expected input is a (transformed_data, transformed_metadata) tuple.
Example use:
def preprocessing_fn(inputs): ... return {'x_scaled': tft.scale_to_z_score(inputs['x'], name='x')} raw_data = [dict(x=1), dict(x=2), dict(x=3)] feature_spec = dict(x=tf.io.FixedLenFeature([], tf.int64)) raw_data_metadata = tft.DatasetMetadata.from_feature_spec(feature_spec) output_path = os.path.join(tempfile.mkdtemp(), 'result') with beam.Pipeline() as p: ... with tft_beam.Context(temp_dir=tempfile.mkdtemp()): ... data_pcoll = p | beam.Create(raw_data) ... transformed_dataset, transform_fn = ( ... (data_pcoll, raw_data_metadata) ... | tft_beam.AnalyzeAndTransformDataset(preprocessing_fn)) ... _ = ( ... transformed_dataset ... | tft_beam.EncodeTransformedDataset() ... | beam.io.WriteToTFRecord(output_path, shard_name_template='')) result_feature_spec ={'x_scaled': tf.io.FixedLenFeature([], tf.float32)} list(tf.data.TFRecordDataset([output_path]) ... .map(lambda x: tf.io.parse_example(x, result_feature_spec)) ... .as_numpy_iterator()) [{'x_scaled': -1.2247448}, {'x_scaled': 0.0}, {'x_scaled': 1.2247448}]
Source code in apache_beam/transforms/ptransform.py
Attributes¶
Functions¶
annotations
¶
default_label
¶
default_type_hints
¶
Source code in apache_beam/transforms/ptransform.py
display_data
¶
Returns the display data associated to a pipeline component.
It should be reimplemented in pipeline components that wish to have static display data.
RETURNS | DESCRIPTION |
---|---|
Dict[str, Any]: A dictionary containing |
|
The value might be an integer, float or string value; a |
|
class: |
|
(e.g. short value, label, url); or a :class: |
|
that has more display data that should be picked up. For example:: { 'key1': 'string_value', 'key2': 1234, 'key3': 3.14159265, 'key4': DisplayDataItem('apache.org', url='http://apache.org'), 'key5': subComponent } |
Source code in apache_beam/transforms/display.py
expand
¶
Source code in tensorflow_transform/beam/impl.py
from_runner_api
classmethod
¶
Source code in apache_beam/transforms/ptransform.py
get_resource_hints
¶
Source code in apache_beam/transforms/ptransform.py
get_type_hints
¶
Gets and/or initializes type hints for this object.
If type hints have not been set, attempts to initialize type hints in this order: - Using self.default_type_hints(). - Using self.class type hints.
Source code in apache_beam/typehints/decorators.py
get_windowing
¶
Returns the window function to be associated with transform's output.
By default most transforms just return the windowing function associated with the input PCollection (or the first input if several).
Source code in apache_beam/transforms/ptransform.py
infer_output_type
¶
register_urn
classmethod
¶
Source code in apache_beam/transforms/ptransform.py
runner_api_requires_keyed_input
¶
to_runner_api
¶
Source code in apache_beam/transforms/ptransform.py
to_runner_api_parameter
¶
Source code in apache_beam/transforms/ptransform.py
to_runner_api_pickled
¶
Source code in apache_beam/transforms/ptransform.py
type_check_inputs
¶
type_check_inputs_or_outputs
¶
Source code in apache_beam/transforms/ptransform.py
type_check_outputs
¶
with_input_types
¶
Annotates the input type of a :class:PTransform
with a type-hint.
PARAMETER | DESCRIPTION |
---|---|
input_type_hint
|
An instance of an allowed built-in type, a custom
class, or an instance of a
:class:
TYPE:
|
RAISES | DESCRIPTION |
---|---|
TypeError
|
If input_type_hint is not a valid type-hint.
See
:obj: |
RETURNS | DESCRIPTION |
---|---|
PTransform
|
A reference to the instance of this particular |
class: |
|
methods. |
Source code in apache_beam/transforms/ptransform.py
with_output_types
¶
Annotates the output type of a :class:PTransform
with a type-hint.
PARAMETER | DESCRIPTION |
---|---|
type_hint
|
An instance of an allowed built-in type, a custom class,
or a :class:
TYPE:
|
RAISES | DESCRIPTION |
---|---|
TypeError
|
If type_hint is not a valid type-hint. See
:obj: |
RETURNS | DESCRIPTION |
---|---|
PTransform
|
A reference to the instance of this particular |
class: |
|
methods. |
Source code in apache_beam/transforms/ptransform.py
with_resource_hints
¶
Adds resource hints to the :class:PTransform
.
Resource hints allow users to express constraints on the environment where the transform should be executed. Interpretation of the resource hints is defined by Beam Runners. Runners may ignore the unsupported hints.
PARAMETER | DESCRIPTION |
---|---|
**kwargs
|
key-value pairs describing hints and their values.
DEFAULT:
|
RAISES | DESCRIPTION |
---|---|
ValueError
|
if provided hints are unknown to the SDK. See
:mod: |
RETURNS | DESCRIPTION |
---|---|
PTransform
|
A reference to the instance of this particular |
class: |
Source code in apache_beam/transforms/ptransform.py
ReadTransformFn
¶
Bases: PTransform
Reads a TransformFn written by WriteTransformFn.
Source code in tensorflow_transform/beam/tft_beam_io/transform_fn_io.py
Attributes¶
Functions¶
annotations
¶
default_label
¶
default_type_hints
¶
Source code in apache_beam/transforms/ptransform.py
display_data
¶
Returns the display data associated to a pipeline component.
It should be reimplemented in pipeline components that wish to have static display data.
RETURNS | DESCRIPTION |
---|---|
Dict[str, Any]: A dictionary containing |
|
The value might be an integer, float or string value; a |
|
class: |
|
(e.g. short value, label, url); or a :class: |
|
that has more display data that should be picked up. For example:: { 'key1': 'string_value', 'key2': 1234, 'key3': 3.14159265, 'key4': DisplayDataItem('apache.org', url='http://apache.org'), 'key5': subComponent } |
Source code in apache_beam/transforms/display.py
expand
¶
Source code in tensorflow_transform/beam/tft_beam_io/transform_fn_io.py
from_runner_api
classmethod
¶
Source code in apache_beam/transforms/ptransform.py
get_resource_hints
¶
Source code in apache_beam/transforms/ptransform.py
get_type_hints
¶
Gets and/or initializes type hints for this object.
If type hints have not been set, attempts to initialize type hints in this order: - Using self.default_type_hints(). - Using self.class type hints.
Source code in apache_beam/typehints/decorators.py
get_windowing
¶
Returns the window function to be associated with transform's output.
By default most transforms just return the windowing function associated with the input PCollection (or the first input if several).
Source code in apache_beam/transforms/ptransform.py
infer_output_type
¶
register_urn
classmethod
¶
Source code in apache_beam/transforms/ptransform.py
runner_api_requires_keyed_input
¶
to_runner_api
¶
Source code in apache_beam/transforms/ptransform.py
to_runner_api_parameter
¶
Source code in apache_beam/transforms/ptransform.py
to_runner_api_pickled
¶
Source code in apache_beam/transforms/ptransform.py
type_check_inputs
¶
type_check_inputs_or_outputs
¶
Source code in apache_beam/transforms/ptransform.py
type_check_outputs
¶
with_input_types
¶
Annotates the input type of a :class:PTransform
with a type-hint.
PARAMETER | DESCRIPTION |
---|---|
input_type_hint
|
An instance of an allowed built-in type, a custom
class, or an instance of a
:class:
TYPE:
|
RAISES | DESCRIPTION |
---|---|
TypeError
|
If input_type_hint is not a valid type-hint.
See
:obj: |
RETURNS | DESCRIPTION |
---|---|
PTransform
|
A reference to the instance of this particular |
class: |
|
methods. |
Source code in apache_beam/transforms/ptransform.py
with_output_types
¶
Annotates the output type of a :class:PTransform
with a type-hint.
PARAMETER | DESCRIPTION |
---|---|
type_hint
|
An instance of an allowed built-in type, a custom class,
or a :class:
TYPE:
|
RAISES | DESCRIPTION |
---|---|
TypeError
|
If type_hint is not a valid type-hint. See
:obj: |
RETURNS | DESCRIPTION |
---|---|
PTransform
|
A reference to the instance of this particular |
class: |
|
methods. |
Source code in apache_beam/transforms/ptransform.py
with_resource_hints
¶
Adds resource hints to the :class:PTransform
.
Resource hints allow users to express constraints on the environment where the transform should be executed. Interpretation of the resource hints is defined by Beam Runners. Runners may ignore the unsupported hints.
PARAMETER | DESCRIPTION |
---|---|
**kwargs
|
key-value pairs describing hints and their values.
DEFAULT:
|
RAISES | DESCRIPTION |
---|---|
ValueError
|
if provided hints are unknown to the SDK. See
:mod: |
RETURNS | DESCRIPTION |
---|---|
PTransform
|
A reference to the instance of this particular |
class: |
Source code in apache_beam/transforms/ptransform.py
TransformDataset
¶
Bases: PTransform
Applies the transformation computed by transforming a Dataset.
TransformDataset's expand
method is called on a (dataset, transform_fn)
pair. It applies the transform_fn to each row of the input dataset and
returns the resulting dataset.
PARAMETER | DESCRIPTION |
---|---|
exclude_outputs
|
(Optional) Output features that should not be produced.
DEFAULT:
|
output_record_batches
|
(Optional) A bool. If
DEFAULT:
|
Source code in tensorflow_transform/beam/impl.py
Attributes¶
Functions¶
annotations
¶
default_label
¶
default_type_hints
¶
Source code in apache_beam/transforms/ptransform.py
display_data
¶
Returns the display data associated to a pipeline component.
It should be reimplemented in pipeline components that wish to have static display data.
RETURNS | DESCRIPTION |
---|---|
Dict[str, Any]: A dictionary containing |
|
The value might be an integer, float or string value; a |
|
class: |
|
(e.g. short value, label, url); or a :class: |
|
that has more display data that should be picked up. For example:: { 'key1': 'string_value', 'key2': 1234, 'key3': 3.14159265, 'key4': DisplayDataItem('apache.org', url='http://apache.org'), 'key5': subComponent } |
Source code in apache_beam/transforms/display.py
expand
¶
Transforms the dataset using the transform_fn.
PARAMETER | DESCRIPTION |
---|---|
dataset_and_transform_fn
|
A tuple of dataset and preprocessing
|
RETURNS | DESCRIPTION |
---|---|
A dataset transformed according to the transform_fn. |
Source code in tensorflow_transform/beam/impl.py
1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 |
|
from_runner_api
classmethod
¶
Source code in apache_beam/transforms/ptransform.py
get_resource_hints
¶
Source code in apache_beam/transforms/ptransform.py
get_type_hints
¶
Gets and/or initializes type hints for this object.
If type hints have not been set, attempts to initialize type hints in this order: - Using self.default_type_hints(). - Using self.class type hints.
Source code in apache_beam/typehints/decorators.py
get_windowing
¶
Returns the window function to be associated with transform's output.
By default most transforms just return the windowing function associated with the input PCollection (or the first input if several).
Source code in apache_beam/transforms/ptransform.py
infer_output_type
¶
register_urn
classmethod
¶
Source code in apache_beam/transforms/ptransform.py
runner_api_requires_keyed_input
¶
to_runner_api
¶
Source code in apache_beam/transforms/ptransform.py
to_runner_api_parameter
¶
Source code in apache_beam/transforms/ptransform.py
to_runner_api_pickled
¶
Source code in apache_beam/transforms/ptransform.py
type_check_inputs
¶
type_check_inputs_or_outputs
¶
Source code in apache_beam/transforms/ptransform.py
type_check_outputs
¶
with_input_types
¶
Annotates the input type of a :class:PTransform
with a type-hint.
PARAMETER | DESCRIPTION |
---|---|
input_type_hint
|
An instance of an allowed built-in type, a custom
class, or an instance of a
:class:
TYPE:
|
RAISES | DESCRIPTION |
---|---|
TypeError
|
If input_type_hint is not a valid type-hint.
See
:obj: |
RETURNS | DESCRIPTION |
---|---|
PTransform
|
A reference to the instance of this particular |
class: |
|
methods. |
Source code in apache_beam/transforms/ptransform.py
with_output_types
¶
Annotates the output type of a :class:PTransform
with a type-hint.
PARAMETER | DESCRIPTION |
---|---|
type_hint
|
An instance of an allowed built-in type, a custom class,
or a :class:
TYPE:
|
RAISES | DESCRIPTION |
---|---|
TypeError
|
If type_hint is not a valid type-hint. See
:obj: |
RETURNS | DESCRIPTION |
---|---|
PTransform
|
A reference to the instance of this particular |
class: |
|
methods. |
Source code in apache_beam/transforms/ptransform.py
with_resource_hints
¶
Adds resource hints to the :class:PTransform
.
Resource hints allow users to express constraints on the environment where the transform should be executed. Interpretation of the resource hints is defined by Beam Runners. Runners may ignore the unsupported hints.
PARAMETER | DESCRIPTION |
---|---|
**kwargs
|
key-value pairs describing hints and their values.
DEFAULT:
|
RAISES | DESCRIPTION |
---|---|
ValueError
|
if provided hints are unknown to the SDK. See
:mod: |
RETURNS | DESCRIPTION |
---|---|
PTransform
|
A reference to the instance of this particular |
class: |
Source code in apache_beam/transforms/ptransform.py
WriteMetadata
¶
Bases: PTransform
A PTransform to write Metadata to disk.
Input can either be a DatasetMetadata or a tuple of properties.
Depending on the optional write_to_unique_subdirectory
, writes the given
metadata to either path
or a new unique subdirectory under path
.
Returns a singleton with the path to which the metadata was written.
Init method.
PARAMETER | DESCRIPTION |
---|---|
path
|
A str, the default path that the metadata should be written to.
|
pipeline
|
A beam Pipeline.
|
write_to_unique_subdirectory
|
(Optional) A bool indicating whether to
write the metadata out to
DEFAULT:
|
Source code in tensorflow_transform/beam/tft_beam_io/beam_metadata_io.py
Attributes¶
Functions¶
annotations
¶
default_label
¶
default_type_hints
¶
Source code in apache_beam/transforms/ptransform.py
display_data
¶
Returns the display data associated to a pipeline component.
It should be reimplemented in pipeline components that wish to have static display data.
RETURNS | DESCRIPTION |
---|---|
Dict[str, Any]: A dictionary containing |
|
The value might be an integer, float or string value; a |
|
class: |
|
(e.g. short value, label, url); or a :class: |
|
that has more display data that should be picked up. For example:: { 'key1': 'string_value', 'key2': 1234, 'key3': 3.14159265, 'key4': DisplayDataItem('apache.org', url='http://apache.org'), 'key5': subComponent } |
Source code in apache_beam/transforms/display.py
expand
¶
Source code in tensorflow_transform/beam/tft_beam_io/beam_metadata_io.py
from_runner_api
classmethod
¶
Source code in apache_beam/transforms/ptransform.py
get_resource_hints
¶
Source code in apache_beam/transforms/ptransform.py
get_type_hints
¶
Gets and/or initializes type hints for this object.
If type hints have not been set, attempts to initialize type hints in this order: - Using self.default_type_hints(). - Using self.class type hints.
Source code in apache_beam/typehints/decorators.py
get_windowing
¶
Returns the window function to be associated with transform's output.
By default most transforms just return the windowing function associated with the input PCollection (or the first input if several).
Source code in apache_beam/transforms/ptransform.py
infer_output_type
¶
register_urn
classmethod
¶
Source code in apache_beam/transforms/ptransform.py
runner_api_requires_keyed_input
¶
to_runner_api
¶
Source code in apache_beam/transforms/ptransform.py
to_runner_api_parameter
¶
Source code in apache_beam/transforms/ptransform.py
to_runner_api_pickled
¶
Source code in apache_beam/transforms/ptransform.py
type_check_inputs
¶
type_check_inputs_or_outputs
¶
Source code in apache_beam/transforms/ptransform.py
type_check_outputs
¶
with_input_types
¶
Annotates the input type of a :class:PTransform
with a type-hint.
PARAMETER | DESCRIPTION |
---|---|
input_type_hint
|
An instance of an allowed built-in type, a custom
class, or an instance of a
:class:
TYPE:
|
RAISES | DESCRIPTION |
---|---|
TypeError
|
If input_type_hint is not a valid type-hint.
See
:obj: |
RETURNS | DESCRIPTION |
---|---|
PTransform
|
A reference to the instance of this particular |
class: |
|
methods. |
Source code in apache_beam/transforms/ptransform.py
with_output_types
¶
Annotates the output type of a :class:PTransform
with a type-hint.
PARAMETER | DESCRIPTION |
---|---|
type_hint
|
An instance of an allowed built-in type, a custom class,
or a :class:
TYPE:
|
RAISES | DESCRIPTION |
---|---|
TypeError
|
If type_hint is not a valid type-hint. See
:obj: |
RETURNS | DESCRIPTION |
---|---|
PTransform
|
A reference to the instance of this particular |
class: |
|
methods. |
Source code in apache_beam/transforms/ptransform.py
with_resource_hints
¶
Adds resource hints to the :class:PTransform
.
Resource hints allow users to express constraints on the environment where the transform should be executed. Interpretation of the resource hints is defined by Beam Runners. Runners may ignore the unsupported hints.
PARAMETER | DESCRIPTION |
---|---|
**kwargs
|
key-value pairs describing hints and their values.
DEFAULT:
|
RAISES | DESCRIPTION |
---|---|
ValueError
|
if provided hints are unknown to the SDK. See
:mod: |
RETURNS | DESCRIPTION |
---|---|
PTransform
|
A reference to the instance of this particular |
class: |
Source code in apache_beam/transforms/ptransform.py
WriteTransformFn
¶
Bases: PTransform
Writes a TransformFn to disk.
The internal structure is a directory containing two subdirectories. The first is 'transformed_metadata' and contains metadata of the transformed data. The second is 'transform_fn' and contains a SavedModel representing the transformed data.
Source code in tensorflow_transform/beam/tft_beam_io/transform_fn_io.py
Attributes¶
Functions¶
annotations
¶
default_label
¶
default_type_hints
¶
Source code in apache_beam/transforms/ptransform.py
display_data
¶
Returns the display data associated to a pipeline component.
It should be reimplemented in pipeline components that wish to have static display data.
RETURNS | DESCRIPTION |
---|---|
Dict[str, Any]: A dictionary containing |
|
The value might be an integer, float or string value; a |
|
class: |
|
(e.g. short value, label, url); or a :class: |
|
that has more display data that should be picked up. For example:: { 'key1': 'string_value', 'key2': 1234, 'key3': 3.14159265, 'key4': DisplayDataItem('apache.org', url='http://apache.org'), 'key5': subComponent } |
Source code in apache_beam/transforms/display.py
expand
¶
Source code in tensorflow_transform/beam/tft_beam_io/transform_fn_io.py
from_runner_api
classmethod
¶
Source code in apache_beam/transforms/ptransform.py
get_resource_hints
¶
Source code in apache_beam/transforms/ptransform.py
get_type_hints
¶
Gets and/or initializes type hints for this object.
If type hints have not been set, attempts to initialize type hints in this order: - Using self.default_type_hints(). - Using self.class type hints.
Source code in apache_beam/typehints/decorators.py
get_windowing
¶
Returns the window function to be associated with transform's output.
By default most transforms just return the windowing function associated with the input PCollection (or the first input if several).
Source code in apache_beam/transforms/ptransform.py
infer_output_type
¶
register_urn
classmethod
¶
Source code in apache_beam/transforms/ptransform.py
runner_api_requires_keyed_input
¶
to_runner_api
¶
Source code in apache_beam/transforms/ptransform.py
to_runner_api_parameter
¶
Source code in apache_beam/transforms/ptransform.py
to_runner_api_pickled
¶
Source code in apache_beam/transforms/ptransform.py
type_check_inputs
¶
type_check_inputs_or_outputs
¶
Source code in apache_beam/transforms/ptransform.py
type_check_outputs
¶
with_input_types
¶
Annotates the input type of a :class:PTransform
with a type-hint.
PARAMETER | DESCRIPTION |
---|---|
input_type_hint
|
An instance of an allowed built-in type, a custom
class, or an instance of a
:class:
TYPE:
|
RAISES | DESCRIPTION |
---|---|
TypeError
|
If input_type_hint is not a valid type-hint.
See
:obj: |
RETURNS | DESCRIPTION |
---|---|
PTransform
|
A reference to the instance of this particular |
class: |
|
methods. |
Source code in apache_beam/transforms/ptransform.py
with_output_types
¶
Annotates the output type of a :class:PTransform
with a type-hint.
PARAMETER | DESCRIPTION |
---|---|
type_hint
|
An instance of an allowed built-in type, a custom class,
or a :class:
TYPE:
|
RAISES | DESCRIPTION |
---|---|
TypeError
|
If type_hint is not a valid type-hint. See
:obj: |
RETURNS | DESCRIPTION |
---|---|
PTransform
|
A reference to the instance of this particular |
class: |
|
methods. |
Source code in apache_beam/transforms/ptransform.py
with_resource_hints
¶
Adds resource hints to the :class:PTransform
.
Resource hints allow users to express constraints on the environment where the transform should be executed. Interpretation of the resource hints is defined by Beam Runners. Runners may ignore the unsupported hints.
PARAMETER | DESCRIPTION |
---|---|
**kwargs
|
key-value pairs describing hints and their values.
DEFAULT:
|
RAISES | DESCRIPTION |
---|---|
ValueError
|
if provided hints are unknown to the SDK. See
:mod: |
RETURNS | DESCRIPTION |
---|---|
PTransform
|
A reference to the instance of this particular |
class: |