TensorFlow Transform tft.beam.analyzer_cache
Module¶
tensorflow_transform.beam.analyzer_cache
¶
Module which allows a pipeilne to define and utilize cached analyzers.
Attributes¶
Classes¶
DatasetCache
¶
Bases: TypedNamedTuple('DatasetCache', [('cache_dict', Mapping[str, PCollection[bytes]]), ('metadata', Optional[Union[PCollection[DatasetCacheMetadata], DatasetCacheMetadata]])])
Complete cache for a dataset as well as metadata.
DatasetCacheMetadata
¶
Bases: TypedNamedTuple('DatasetCacheMetadata', [('dataset_size', int)])
Metadata about a cached dataset.
Functions¶
decode
classmethod
¶
decode(value: bytes) -> DatasetCacheMetadata
DatasetKey
¶
Bases: namedtuple('DatasetKey', ['key', 'is_cached'])
A key for a dataset used for analysis.
Functions¶
non_cacheable
¶
non_cacheable() -> DatasetKey
Creates a non cacheable dataset key, for which no cache will be produced.
ReadAnalysisCacheFromFS
¶
ReadAnalysisCacheFromFS(
cache_base_dir: str,
dataset_keys: Iterable[DatasetKey],
cache_entry_keys: Optional[Iterable[bytes]] = None,
source: Optional[object] = None,
)
Bases: PTransform
Reads cache from the FS written by WriteAnalysisCacheToFS.
Init method.
PARAMETER | DESCRIPTION |
---|---|
cache_base_dir
|
A string, the path that the cache should be stored in.
TYPE:
|
dataset_keys
|
An iterable of
TYPE:
|
cache_entry_keys
|
(Optional) An iterable of cache entry key strings. If
provided, only cache entries that exist in |
source
|
(Optional) A PTransform class that takes a path argument in its constructor, and is used to read the cache. |
Source code in tensorflow_transform/beam/analyzer_cache.py
Attributes¶
Functions¶
annotations
¶
default_label
¶
default_type_hints
¶
Source code in apache_beam/transforms/ptransform.py
display_data
¶
Returns the display data associated to a pipeline component.
It should be reimplemented in pipeline components that wish to have static display data.
RETURNS | DESCRIPTION |
---|---|
Dict[str, Any]: A dictionary containing |
|
The value might be an integer, float or string value; a |
|
class: |
|
(e.g. short value, label, url); or a :class: |
|
that has more display data that should be picked up. For example:: { 'key1': 'string_value', 'key2': 1234, 'key3': 3.14159265, 'key4': DisplayDataItem('apache.org', url='http://apache.org'), 'key5': subComponent } |
Source code in apache_beam/transforms/display.py
expand
¶
expand(pipeline: Pipeline)
Source code in tensorflow_transform/beam/analyzer_cache.py
from_runner_api
classmethod
¶
Source code in apache_beam/transforms/ptransform.py
get_resource_hints
¶
Source code in apache_beam/transforms/ptransform.py
get_type_hints
¶
Gets and/or initializes type hints for this object.
If type hints have not been set, attempts to initialize type hints in this order: - Using self.default_type_hints(). - Using self.class type hints.
Source code in apache_beam/typehints/decorators.py
get_windowing
¶
Returns the window function to be associated with transform's output.
By default most transforms just return the windowing function associated with the input PCollection (or the first input if several).
Source code in apache_beam/transforms/ptransform.py
infer_output_type
¶
register_urn
classmethod
¶
Source code in apache_beam/transforms/ptransform.py
runner_api_requires_keyed_input
¶
to_runner_api
¶
Source code in apache_beam/transforms/ptransform.py
to_runner_api_parameter
¶
Source code in apache_beam/transforms/ptransform.py
to_runner_api_pickled
¶
Source code in apache_beam/transforms/ptransform.py
type_check_inputs
¶
type_check_inputs_or_outputs
¶
Source code in apache_beam/transforms/ptransform.py
type_check_outputs
¶
with_input_types
¶
Annotates the input type of a :class:PTransform
with a type-hint.
PARAMETER | DESCRIPTION |
---|---|
input_type_hint
|
An instance of an allowed built-in type, a custom
class, or an instance of a
:class:
TYPE:
|
RAISES | DESCRIPTION |
---|---|
TypeError
|
If input_type_hint is not a valid type-hint.
See
:obj: |
RETURNS | DESCRIPTION |
---|---|
PTransform
|
A reference to the instance of this particular |
class: |
|
methods. |
Source code in apache_beam/transforms/ptransform.py
with_output_types
¶
Annotates the output type of a :class:PTransform
with a type-hint.
PARAMETER | DESCRIPTION |
---|---|
type_hint
|
An instance of an allowed built-in type, a custom class,
or a :class:
TYPE:
|
RAISES | DESCRIPTION |
---|---|
TypeError
|
If type_hint is not a valid type-hint. See
:obj: |
RETURNS | DESCRIPTION |
---|---|
PTransform
|
A reference to the instance of this particular |
class: |
|
methods. |
Source code in apache_beam/transforms/ptransform.py
with_resource_hints
¶
Adds resource hints to the :class:PTransform
.
Resource hints allow users to express constraints on the environment where the transform should be executed. Interpretation of the resource hints is defined by Beam Runners. Runners may ignore the unsupported hints.
PARAMETER | DESCRIPTION |
---|---|
**kwargs
|
key-value pairs describing hints and their values.
DEFAULT:
|
RAISES | DESCRIPTION |
---|---|
ValueError
|
if provided hints are unknown to the SDK. See
:mod: |
RETURNS | DESCRIPTION |
---|---|
PTransform
|
A reference to the instance of this particular |
class: |
Source code in apache_beam/transforms/ptransform.py
WriteAnalysisCacheToFS
¶
WriteAnalysisCacheToFS(
pipeline: Pipeline,
cache_base_dir: str,
dataset_keys: Optional[Iterable[DatasetKey]] = None,
sink: Optional[object] = None,
)
Bases: PTransform
Writes a cache object that can be read by ReadAnalysisCacheFromFS.
Given a cache collection, this writes it to the configured directory. If the configured directory already contains cache, this will merge the new cache with the old. NOTE: This merging of cache is determined at beam graph construction time, so the cache must already exist there when constructing this.
Init method.
PARAMETER | DESCRIPTION |
---|---|
pipeline
|
A beam Pipeline.
TYPE:
|
cache_base_dir
|
A str, the path that the cache should be stored in.
TYPE:
|
dataset_keys
|
(Optional) An iterable of strings.
TYPE:
|
sink
|
(Optional) A PTransform class that takes a path in its constructor, and is used to write the cache. If not provided this uses a GZipped TFRecord sink. |
Source code in tensorflow_transform/beam/analyzer_cache.py
Attributes¶
Functions¶
annotations
¶
default_label
¶
default_type_hints
¶
Source code in apache_beam/transforms/ptransform.py
display_data
¶
Returns the display data associated to a pipeline component.
It should be reimplemented in pipeline components that wish to have static display data.
RETURNS | DESCRIPTION |
---|---|
Dict[str, Any]: A dictionary containing |
|
The value might be an integer, float or string value; a |
|
class: |
|
(e.g. short value, label, url); or a :class: |
|
that has more display data that should be picked up. For example:: { 'key1': 'string_value', 'key2': 1234, 'key3': 3.14159265, 'key4': DisplayDataItem('apache.org', url='http://apache.org'), 'key5': subComponent } |
Source code in apache_beam/transforms/display.py
expand
¶
Source code in tensorflow_transform/beam/analyzer_cache.py
from_runner_api
classmethod
¶
Source code in apache_beam/transforms/ptransform.py
get_resource_hints
¶
Source code in apache_beam/transforms/ptransform.py
get_type_hints
¶
Gets and/or initializes type hints for this object.
If type hints have not been set, attempts to initialize type hints in this order: - Using self.default_type_hints(). - Using self.class type hints.
Source code in apache_beam/typehints/decorators.py
get_windowing
¶
Returns the window function to be associated with transform's output.
By default most transforms just return the windowing function associated with the input PCollection (or the first input if several).
Source code in apache_beam/transforms/ptransform.py
infer_output_type
¶
register_urn
classmethod
¶
Source code in apache_beam/transforms/ptransform.py
runner_api_requires_keyed_input
¶
to_runner_api
¶
Source code in apache_beam/transforms/ptransform.py
to_runner_api_parameter
¶
Source code in apache_beam/transforms/ptransform.py
to_runner_api_pickled
¶
Source code in apache_beam/transforms/ptransform.py
type_check_inputs
¶
type_check_inputs_or_outputs
¶
Source code in apache_beam/transforms/ptransform.py
type_check_outputs
¶
with_input_types
¶
Annotates the input type of a :class:PTransform
with a type-hint.
PARAMETER | DESCRIPTION |
---|---|
input_type_hint
|
An instance of an allowed built-in type, a custom
class, or an instance of a
:class:
TYPE:
|
RAISES | DESCRIPTION |
---|---|
TypeError
|
If input_type_hint is not a valid type-hint.
See
:obj: |
RETURNS | DESCRIPTION |
---|---|
PTransform
|
A reference to the instance of this particular |
class: |
|
methods. |
Source code in apache_beam/transforms/ptransform.py
with_output_types
¶
Annotates the output type of a :class:PTransform
with a type-hint.
PARAMETER | DESCRIPTION |
---|---|
type_hint
|
An instance of an allowed built-in type, a custom class,
or a :class:
TYPE:
|
RAISES | DESCRIPTION |
---|---|
TypeError
|
If type_hint is not a valid type-hint. See
:obj: |
RETURNS | DESCRIPTION |
---|---|
PTransform
|
A reference to the instance of this particular |
class: |
|
methods. |
Source code in apache_beam/transforms/ptransform.py
with_resource_hints
¶
Adds resource hints to the :class:PTransform
.
Resource hints allow users to express constraints on the environment where the transform should be executed. Interpretation of the resource hints is defined by Beam Runners. Runners may ignore the unsupported hints.
PARAMETER | DESCRIPTION |
---|---|
**kwargs
|
key-value pairs describing hints and their values.
DEFAULT:
|
RAISES | DESCRIPTION |
---|---|
ValueError
|
if provided hints are unknown to the SDK. See
:mod: |
RETURNS | DESCRIPTION |
---|---|
PTransform
|
A reference to the instance of this particular |
class: |
Source code in apache_beam/transforms/ptransform.py
Functions¶
make_cache_entry_key
¶
validate_dataset_keys
¶
validate_dataset_keys(dataset_keys: Iterable[DatasetKey])