expression_impl
¶
struct2tensor.expression_impl
¶
Import all modules in expression_impl.
The modules in this file should be accessed like the following:
import struct2tensor as s2t
from struct2tensor import expression_impl
s2t.expression_impl.apply_schema
Modules¶
apply_schema
¶
Apply a schema to an expression.
A tensorflow metadata schema (TODO(martinz): link) represents more detailed information about the data: specifically, it presents domain information (e.g., not just integers, but integers between 0 and 10), and more detailed structural information (e.g., this field occurs in at least 70% of its parents, and when it occurs, it shows up 5 to 7 times).
Applying a schema attaches a tensorflow metadata schema to an expression: namely, it aligns the features in the schema with the expression's children by name (possibly recursively).
After applying a schema to an expression, one can use promote, broadcast, et cetera, and the schema for new expressions will be inferred. If you write a custom expression, you can write code that determines the schema information of the result.
To get the schema back, call get_schema().
This does not filter out fields not in the schema.
my_expr = ...
my_schema = # ...schema here...
my_new_schema = my_expr.apply_schema(my_schema).get_schema()
# my_new_schema has semantically identical information on the fields as my_schema.
TODO(martinz): Add utilities to:
- Get the (non-deprecated) paths from a schema.
- Check if any paths in the schema are not in the expression.
- Check if any paths in the expression are not in the schema.
- Project the expression to paths in the schema.
Functions¶
apply_schema
¶
apply_schema(
expr: Expression, schema: Schema
) -> Expression
Source code in struct2tensor/expression_impl/apply_schema.py
Modules¶
broadcast
¶
Methods for broadcasting a path in a tree.
This provides methods for broadcasting a field anonymously (that is used in promote_and_broadcast), or with an explicitly given name.
Suppose you have an expr representing:
Then:
becomes:
session: {
event: {
nv: 10
nv:11
}
event: {
nv: 10
nv:11
}
val: 10
val: 11
}
session: {
event: {nv: 20}
event: {nv: 20}
val: 20
}
Functions¶
broadcast
¶
broadcast(
root: Expression,
origin: Path,
sibling_name: Step,
new_field_name: Step,
) -> Expression
broadcast_anonymous
¶
broadcast_anonymous(
root: Expression, origin: Path, sibling: Step
) -> Tuple[Expression, Path]
Modules¶
depth_limit
¶
Caps the depth of an expression.
Suppose you have an expression expr modeled as:
if expr_2 = depth_limit.limit_depth(expr, 2) You get:
Functions¶
limit_depth
¶
limit_depth(
expr: Expression, depth_limit: int
) -> Expression
Limit the depth to nodes k steps from expr.
Modules¶
filter_expression
¶
Create a new expression that is a filtered version of an original one.
There are two public methods in this module: filter_by_sibling and filter_by_child. As with most other operations, these create a new tree which has all the original paths of the original tree, but with a new subtree.
filter_by_sibling allows you to filter an expression by a boolean sibling field.
Beginning with the struct:
root =
-----*----------------------------------------------------
/ \ \
root0 root1----------------------- root2 (empty)
/ \ / \ \ \
| keep_my_sib0:False | keep_my_sib1:True | keep_my_sib2:False
doc0----- doc1--------------- doc2--------
| \ \ \ \ \
bar:"a" keep_me:False bar:"b" bar:"c" keep_me:True bar:"d"
# Note, keep_my_sib and doc must have the same shape (e.g., each root
has the same number of keep_my_sib children as doc children).
root_2 = filter_expression.filter_by_sibling(
root, path.create_path("doc"), "keep_my_sib", "new_doc")
End with the struct (suppressing original doc):
-----*----------------------------------------------------
/ \ \
root0 root1------------------ root2 (empty)
\ / \ \
keep_my_sib0:False | keep_my_sib1:True keep_my_sib2:False
new_doc0-----------
\ \ \
bar:"b" bar:"c" keep_me:True
filter_by_sibling allows you to filter an expression by a optional boolean child field.
The following call will have the same effect as above:
Functions¶
filter_by_child
¶
filter_by_child(
expr: Expression,
p: Path,
child_field_name: Step,
new_field_name: Step,
) -> Expression
Filter an expression by an optional boolean child field.
If the child field is present and True, then keep that parent. Otherwise, drop the parent.
PARAMETER | DESCRIPTION |
---|---|
expr |
the original expression
TYPE:
|
p |
the path to filter.
TYPE:
|
child_field_name |
the boolean child field to use to filter.
TYPE:
|
new_field_name |
the new, filtered version of path.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Expression
|
The new root expression. |
Source code in struct2tensor/expression_impl/filter_expression.py
filter_by_sibling
¶
filter_by_sibling(
expr: Expression,
p: Path,
sibling_field_name: Step,
new_field_name: Step,
) -> Expression
Filter an expression by its sibling.
This is similar to boolean_mask. The shape of the path being filtered and the sibling must be identical (e.g., each parent object must have an equal number of source and sibling children).
PARAMETER | DESCRIPTION |
---|---|
expr |
the root expression.
TYPE:
|
p |
a path to the source to be filtered.
TYPE:
|
sibling_field_name |
the sibling to use as a mask.
TYPE:
|
new_field_name |
a new sibling to create.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Expression
|
a new root. |
Source code in struct2tensor/expression_impl/filter_expression.py
Modules¶
index
¶
get_positional_index and get_index_from_end methods.
The parent_index identifies the index of the parent of each element. These methods take the parent_index to determine the relationship with respect to other elements.
Given:
session: {
event: {
val: 111
}
event: {
val: 121
val: 122
}
}
session: {
event: {
val: 10
val: 7
}
event: {
val: 1
}
}
yields:
session: {
event: {
val: 111
val_index: 0
}
event: {
val: 121
val: 122
val_index: 0
val_index: 1
}
}
session: {
event: {
val: 10
val: 7
val_index: 0
val_index: 1
}
event: {
val: 1
val_index: 0
}
}
session: {
event: {
val: 111
neg_val_index: -1
}
event: {
val: 121
val: 122
neg_val_index: -2
neg_val_index: -1
}
}
session: {
event: {
val: 10
val: 7
neg_val_index: 2
neg_val_index: -1
}
event: {
val: 1
neg_val_index: -1
}
}
These methods are useful when you want to depend upon the index of a field. For example, if you want to filter examples based upon their index, or cogroup two fields by index, then first creating the index is useful.
Note that while the parent indices of these fields seem like overhead, they are just references to the parent indices of other fields, and are therefore take little memory or CPU.
Functions¶
get_index_from_end
¶
get_index_from_end(
t: Expression, source_path: Path, new_field_name: Step
) -> Tuple[Expression, Path]
Gets the number of steps from the end of the array.
Given an array ["a", "b", "c"], with indices [0, 1, 2], the result of this is [-3,-2,-1].
PARAMETER | DESCRIPTION |
---|---|
t |
original expression
TYPE:
|
source_path |
path in expression to get index of.
TYPE:
|
new_field_name |
the name of the new field.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Tuple[Expression, Path]
|
The new expression and the new path as a pair. |
Source code in struct2tensor/expression_impl/index.py
get_positional_index
¶
get_positional_index(
expr: Expression,
source_path: Path,
new_field_name: Step,
) -> Tuple[Expression, Path]
Gets the positional index.
Given a field with parent_index [0,1,1,2,3,4,4], this returns: parent_index [0,1,1,2,3,4,4] and value [0,0,1,0,0,0,1]
PARAMETER | DESCRIPTION |
---|---|
expr |
original expression
TYPE:
|
source_path |
path in expression to get index of.
TYPE:
|
new_field_name |
the name of the new field.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Tuple[Expression, Path]
|
The new expression and the new path as a pair. |
Source code in struct2tensor/expression_impl/index.py
Modules¶
map_prensor
¶
Arbitrary operations from sparse and ragged tensors to a leaf field.
There are two public methods of note right now: map_sparse_tensor and map_ragged_tensor.
Assume expr is:
session: {
event: {
val_a: 10
val_b: 1
}
event: {
val_a: 20
val_b: 2
}
event: {
}
event: {
val_a: 40
}
event: {
val_b: 5
}
}
Either of the following alternatives will add val_a and val_b to create val_sum.
map_sparse_tensor converts val_a and val_b to sparse tensors, and then add them to produce val_sum.
new_root = map_prensor.map_sparse_tensor(
expr,
path.Path(["event"]),
[path.Path(["val_a"]), path.Path(["val_b"])],
lambda x,y: x + y,
False,
tf.int32,
"val_sum")
map_ragged_tensor converts val_a and val_b to ragged tensors, and then add them to produce val_sum.
new_root = map_prensor.map_ragged_tensor(
expr,
path.Path(["event"]),
[path.Path(["val_a"]), path.Path(["val_b"])],
lambda x,y: x + y,
False,
tf.int32,
"val_sum")
The result of either is:
session: {
event: {
val_a: 10
val_b: 1
val_sum: 11
}
event: {
val_a: 20
val_b: 2
val_sum: 22
}
event: {
}
event: {
val_a: 40
val_sum: 40
}
event: {
val_b: 5
val_sum: 5
}
}
Functions¶
map_ragged_tensor
¶
map_ragged_tensor(
root: Expression,
root_path: Path,
paths: Sequence[Path],
operation: Callable[..., RaggedTensor],
is_repeated: bool,
dtype: DType,
new_field_name: Step,
) -> Expression
Map a ragged tensor.
PARAMETER | DESCRIPTION |
---|---|
root |
the root of the expression.
TYPE:
|
root_path |
the path relative to which the ragged tensors are calculated.
TYPE:
|
paths |
the input paths relative to the root_path |
operation |
a method that takes the list of ragged tensors as input and returns a ragged tensor.
TYPE:
|
is_repeated |
true if the result of operation is repeated.
TYPE:
|
dtype |
dtype of the result of the operation.
TYPE:
|
new_field_name |
root_path.get_child(new_field_name) is the path of the result.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Expression
|
A new root expression containing the old root expression plus the new path, root_path.get_child(new_field_name), with the result of the operation. |
Source code in struct2tensor/expression_impl/map_prensor.py
map_sparse_tensor
¶
map_sparse_tensor(
root: Expression,
root_path: Path,
paths: Sequence[Path],
operation: Callable[..., SparseTensor],
is_repeated: bool,
dtype: DType,
new_field_name: Step,
) -> Expression
Maps a sparse tensor.
PARAMETER | DESCRIPTION |
---|---|
root |
the root of the expression.
TYPE:
|
root_path |
the path relative to which the sparse tensors are calculated.
TYPE:
|
paths |
the input paths relative to the root_path |
operation |
a method that takes the list of sparse tensors as input and returns a sparse tensor.
TYPE:
|
is_repeated |
true if the result of operation is repeated.
TYPE:
|
dtype |
dtype of the result of the operation.
TYPE:
|
new_field_name |
root_path.get_child(new_field_name) is the path of the result.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Expression
|
A new root expression containing the old root expression plus the new path, root_path.get_child(new_field_name), with the result of the operation. |
Source code in struct2tensor/expression_impl/map_prensor.py
Modules¶
map_prensor_to_prensor
¶
Arbitrary operations from prensors to prensors in an expression.
This is useful if a single op generates an entire structure. In general, it is better to use the existing expressions framework or design a custom expression than use this op. So long as any of the output is required, all of the input is required.
For example, suppose you have an op my_op, that takes a prensor of the form:
and produces a prensor of the form my_result_schema:
my_result_schema = create_schema(
is_repeated=True,
children={"foo2":{is_repeated:True, dtype:tf.int64},
"bar2":{is_repeated:False, dtype:tf.int64}})
If you give it an expression original with the schema:
Result will have the schema:
Classes¶
Schema
¶
Schema(
is_repeated: bool = True,
dtype: Optional[DType] = None,
schema_feature: Optional[Feature] = None,
children: Optional[Dict[Step, Schema]] = None,
)
Bases: object
A finite schema for a prensor.
Effectively, this stores everything for the prensor but the tensors themselves.
Notice that this is slightly different than schema_pb2.Schema, although similar in nature. At present, there is no clear way to extract is_repeated and dtype from schema_pb2.Schema.
See create_schema below for constructing a schema.
Note that for LeafNodeTensor, dtype is not None. Also, for ChildNodeTensor and RootNodeTensor, dtype is None. However, a ChildNodeTensor or RootNodeTensor could be childless.
Create a new Schema object.
PARAMETER | DESCRIPTION |
---|---|
is_repeated |
is the root repeated?
TYPE:
|
dtype |
tf.dtype of the root if the root is a leaf, otherwise None.
TYPE:
|
schema_feature |
schema_pb2.Feature of the root (no struct_domain necessary)
TYPE:
|
children |
child schemas. |
Source code in struct2tensor/expression_impl/map_prensor_to_prensor.py
Functions¶
create_schema
¶
create_schema(
is_repeated: bool = True,
dtype: Optional[DType] = None,
schema_feature: Optional[Feature] = None,
children: Optional[Dict[Step, Any]] = None,
) -> Schema
Create a schema recursively.
Example
PARAMETER | DESCRIPTION |
---|---|
is_repeated |
whether the root is repeated.
TYPE:
|
dtype |
the dtype of a leaf (None for non-leaves).
TYPE:
|
schema_feature |
the schema_pb2.Feature describing this expression. name and struct_domain need not be specified.
TYPE:
|
children |
the child schemas. Note that the value type of children is either a Schema or a dictionary of arguments to create_schema. |
RETURNS | DESCRIPTION |
---|---|
Schema
|
a new Schema represented by the inputs. |
Source code in struct2tensor/expression_impl/map_prensor_to_prensor.py
map_prensor_to_prensor
¶
map_prensor_to_prensor(
root_expr: Expression,
source: Path,
paths_needed: Sequence[Path],
prensor_op: Callable[[Prensor], Prensor],
output_schema: Schema,
) -> Expression
Maps an expression to a prensor, and merges that prensor.
For example, suppose you have an op my_op, that takes a prensor of the form:
and produces a prensor of the form my_result_schema:
If you give it an expression original with the schema:
Result will have the schema:
PARAMETER | DESCRIPTION |
---|---|
root_expr |
the root expression
TYPE:
|
source |
the path where the prensor op is applied.
TYPE:
|
paths_needed |
the paths needed for the op. |
prensor_op |
the prensor op |
output_schema |
the output schema of the op.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Expression
|
A new expression where the prensor is merged. |
Source code in struct2tensor/expression_impl/map_prensor_to_prensor.py
Modules¶
map_values
¶
Maps the values of various leaves of the same child to a single result.
All inputs must have the same shape (parent_index must be equal).
The output is given the same shape (output of function must be of equal length).
Note that the operations are on 1-D tensors (as opposed to scalars).
Functions¶
map_many_values
¶
map_many_values(
root: Expression,
parent_path: Path,
source_fields: Sequence[Step],
operation: Callable[..., Tensor],
dtype: DType,
new_field_name: Step,
) -> Tuple[Expression, Path]
Map multiple sibling fields into a new sibling.
All source fields must have the same shape, and the shape of the output must be the same as well.
PARAMETER | DESCRIPTION |
---|---|
root |
original root.
TYPE:
|
parent_path |
parent path of all sources and the new field.
TYPE:
|
source_fields |
source fields of the operation. Must have the same shape. |
operation |
operation from source_fields to new field.
TYPE:
|
dtype |
type of new field.
TYPE:
|
new_field_name |
name of the new field.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Tuple[Expression, Path]
|
The new expression and the new path as a pair. |
Source code in struct2tensor/expression_impl/map_values.py
map_values
¶
map_values(
root: Expression,
source_path: Path,
operation: Callable[[Tensor], Tensor],
dtype: DType,
new_field_name: Step,
) -> Expression
Map field into a new sibling.
The shape of the output must be the same as the input.
PARAMETER | DESCRIPTION |
---|---|
root |
original root.
TYPE:
|
source_path |
source of the operation.
TYPE:
|
operation |
operation from source_fields to new field.
TYPE:
|
dtype |
type of new field.
TYPE:
|
new_field_name |
name of the new field.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Expression
|
The new expression. |
Source code in struct2tensor/expression_impl/map_values.py
map_values_anonymous
¶
map_values_anonymous(
root: Expression,
source_path: Path,
operation: Callable[[Tensor], Tensor],
dtype: DType,
) -> Tuple[Expression, Path]
Map field into a new sibling.
The shape of the output must be the same as the input.
PARAMETER | DESCRIPTION |
---|---|
root |
original root.
TYPE:
|
source_path |
source of the operation.
TYPE:
|
operation |
operation from source_fields to new field.
TYPE:
|
dtype |
type of new field.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Tuple[Expression, Path]
|
The new expression and the new path as a pair. |
Source code in struct2tensor/expression_impl/map_values.py
Modules¶
parquet
¶
Apache Parquet Dataset.
Example Usage
Classes¶
ParquetDataset
¶
Bases: _RawParquetDataset
A dataset which reads columns from a parquet file and returns a prensor.
The prensor will have a PrensorTypeSpec, which is created based on value_paths.
Note
In tensorflow v1 this dataset will not return a prensor. The output will be the same format as _RawParquetDataset's output (a vector of tensors). The following is a workaround in v1:
Creates a ParquetDataset.
PARAMETER | DESCRIPTION |
---|---|
filenames |
A list containing the name(s) of the file(s) to be read. |
value_paths |
A list of strings of the dotstring path(s) of each leaf path(s). |
batch_size |
An int that determines how many messages are parsed into one prensor tree in an iteration. If there are fewer than batch_size remaining messages, then all remaining messages will be returned.
TYPE:
|
RAISES | DESCRIPTION |
---|---|
ValueError
|
if the column does not exist in the parquet schema. |
Source code in struct2tensor/expression_impl/parquet.py
Functions¶
calculate_parquet_values
¶
calculate_parquet_values(
expressions: List[Expression],
root_exp: _PlaceholderRootExpression,
filenames: List[str],
batch_size: int,
options: Optional[Options] = None,
)
Calculates expressions and returns a parquet dataset.
PARAMETER | DESCRIPTION |
---|---|
expressions |
A list of expressions to calculate.
TYPE:
|
root_exp |
The root placeholder expression to use as the feed dict.
TYPE:
|
filenames |
A list of parquet files. |
batch_size |
The number of messages to batch.
TYPE:
|
options |
calculate options.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
A parquet dataset. |
Source code in struct2tensor/expression_impl/parquet.py
create_expression_from_parquet_file
¶
Creates a placeholder expression from a parquet file.
PARAMETER | DESCRIPTION |
---|---|
filenames |
A list of parquet files. |
RETURNS | DESCRIPTION |
---|---|
_PlaceholderRootExpression
|
A PlaceholderRootExpression that should be used as the root of an expression graph. |
Source code in struct2tensor/expression_impl/parquet.py
Modules¶
parse_message_level_ex
¶
Parses regular fields, extensions, any casts, and map protos.
This is intended for use within proto.py, not independently.
parse_message_level(...) in struct2tensor_ops provides a direct interface to parsing a protocol buffer message. In particular, extensions and regular fields can be directly extracted from the protobuf. However, prensors provide other syntactic sugar to parse protobufs, and parse_message_level_ex(...) handles these in addition to regular fields and extensions.
Specifically, consider google.protobuf.Any and proto maps:
package foo.bar;
message MyMessage {
Any my_any = 1;
map<string, Baz> my_map = 2;
}
message Baz {
int32 my_int = 1;
...
}
Then for MyMessage, the path my_any.(foo.bar.Baz).my_int is an optional path. Also, my_map[x].my_int is an optional path.
Thus, we can run:
my_message_serialized_tensor = ...
my_message_parsed = parse_message_level_ex(
my_message_serialized_tensor,
MyMessage.DESCRIPTOR,
{"my_any", "my_map[x]"})
my_any_serialized = my_message_parsed["my_any"].value
my_any_parsed = parse_message_level_ex(
my_any_serialized,
Any.DESCRIPTOR,
{"(foo.bar.Baz)"})
At this point, my_message_parsed["my_map[x]"].value AND my_any_parsed["(foo.bar.Baz)"].value are serialized Baz tensors.
Attributes¶
Functions¶
get_full_name_from_any_step
¶
get_full_name_from_any_step(
step: ProtoFieldName,
) -> Optional[ProtoFieldName]
Gets the full name of a protobuf from a google.protobuf.Any step.
An any step is of the form (foo.com/bar.Baz). In this case the result would be bar.Baz.
PARAMETER | DESCRIPTION |
---|---|
step |
the string of a step in a path.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Optional[ProtoFieldName]
|
the full name of a protobuf if the step is an any step, or None otherwise. |
Source code in struct2tensor/expression_impl/parse_message_level_ex.py
parse_message_level_ex
¶
parse_message_level_ex(
tensor_of_protos: Tensor,
desc: Descriptor,
field_names: Set[ProtoFieldName],
message_format: str = "binary",
backing_str_tensor: Optional[Tensor] = None,
honor_proto3_optional_semantics: bool = False,
) -> Mapping[StrStep, _ParsedField]
Parses regular fields, extensions, any casts, and map protos.
Source code in struct2tensor/expression_impl/parse_message_level_ex.py
Modules¶
placeholder
¶
Placeholder expression.
A placeholder expression represents prensor nodes, however a prensor is not needed until calculate is called. This allows the user to apply expression queries to a placeholder expression before having an actual prensor object. When calculate is called on a placeholder expression (or a descendant of a placeholder expression), the feed_dict will need to be passed in. Then calculate will bind the prensor with the appropriate placeholder expression.
Sample usage:
placeholder_exp = placeholder.create_expression_from_schema(schema)
new_exp = expression_queries(placeholder_exp, ..)
result = calculate.calculate_values([new_exp],
feed_dict={placeholder_exp: pren})
# placeholder_exp requires a feed_dict to be passed in when calculating
Functions¶
create_expression_from_schema
¶
create_expression_from_schema(
schema: Schema,
) -> _PlaceholderRootExpression
Creates a placeholder expression from a parquet schema.
PARAMETER | DESCRIPTION |
---|---|
schema |
The schema that describes the prensor tree that this placeholder represents.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
_PlaceholderRootExpression
|
A PlaceholderRootExpression that should be used as the root of an expression graph. |
Source code in struct2tensor/expression_impl/placeholder.py
get_placeholder_paths_from_graph
¶
Gets all placeholder paths from an expression graph.
This finds all leaf placeholder expressions in an expression graph, and gets the path of these expressions.
PARAMETER | DESCRIPTION |
---|---|
graph |
expression graph
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[Path]
|
a list of paths of placeholder expressions |
Source code in struct2tensor/expression_impl/placeholder.py
Modules¶
project
¶
project selects a subtree of an expression.
project is often used right before calculating the value.
Example
expr = ...
new_expr = project.project(expr, [path.Path(["foo","bar"]),
path.Path(["x", "y"])])
[prensor_result] = calculate.calculate_prensors([new_expr])
prensor_result now has two paths, "foo.bar" and "x.y".
Functions¶
project
¶
project(
expr: Expression, paths: Sequence[Path]
) -> Expression
select a subtree.
Paths not selected are removed. Paths that are selected are "known", such that if calculate_prensors is called, they will be in the result.
PARAMETER | DESCRIPTION |
---|---|
expr |
the original expression.
TYPE:
|
paths |
the paths to include. |
RETURNS | DESCRIPTION |
---|---|
Expression
|
A projected expression. |
Source code in struct2tensor/expression_impl/project.py
Modules¶
promote
¶
Promote an expression to be a child of its grandparent.
Promote is part of the standard flattening of data, promote_and_broadcast, which takes structured data and flattens it. By directly accessing promote, one can perform simpler operations.
For example, suppose an expr represents:
session: {
event: {
val: 111
}
event: {
val: 121
val: 122
}
}
session: {
event: {
val: 10
val: 7
}
event: {
val: 1
}
}
produces:
session: {
event: {
val: 111
}
event: {
val: 121
val: 122
}
nval: 111
nval: 121
nval: 122
}
session: {
event: {
val: 10
val: 7
}
event: {
val: 1
}
nval: 10
nval: 7
nval: 1
}
Classes¶
PromoteChildExpression
¶
PromoteChildExpression(
origin: Expression, origin_parent: Expression
)
Bases: Expression
The root of the promoted sub tree.
Initialize an expression.
PARAMETER | DESCRIPTION |
---|---|
is_repeated |
if the expression is repeated.
TYPE:
|
my_type |
the DType of a field, or None for an internal node.
TYPE:
|
schema_feature |
the local schema (StructDomain information should not be present).
TYPE:
|
validate_step_format |
If True, validates that steps do not have any characters that could be ambiguously understood as structure delimiters (e.g. "."). If False, such characters are allowed and the client is responsible to ensure to not rely on any auto-coercion of strings to paths.
TYPE:
|
Source code in struct2tensor/expression_impl/promote.py
is_repeated
property
¶is_repeated: bool
True iff the same parent value can have multiple children values.
apply
¶apply(
transform: Callable[[Expression], Expression]
) -> Expression
apply_schema
¶apply_schema(schema: Schema) -> Expression
broadcast
¶broadcast(
source_path: CoercableToPath,
sibling_field: Step,
new_field_name: Step,
) -> Expression
Broadcasts the existing field at source_path to the sibling_field.
Source code in struct2tensor/expression.py
calculate
¶calculate(
sources: Sequence[NodeTensor],
destinations: Sequence[Expression],
options: Options,
side_info: Optional[Prensor] = None,
) -> NodeTensor
Calculates the node tensor of the expression.
The node tensor must be a function of the properties of the expression and the node tensors of the expressions from get_source_expressions().
If is_leaf, then calculate must return a LeafNodeTensor. Otherwise, it must return a ChildNodeTensor or RootNodeTensor.
If calculate_is_identity is true, then this must return source_tensors[0].
Sometimes, for operations such as parsing the proto, calculate will return additional information. For example, calculate() for the root of the proto expression also parses out the tensors required to calculate the tensors of the children. This is why destinations are required.
For a reference use, see calculate_value_slowly(...) below.
PARAMETER | DESCRIPTION |
---|---|
source_tensors |
The node tensors of the expressions in get_source_expressions().
TYPE:
|
destinations |
The expressions that will use the output of this method.
TYPE:
|
options |
Options for the calculation.
TYPE:
|
side_info |
An optional prensor that is used to bind to a placeholder expression. |
RETURNS | DESCRIPTION |
---|---|
NodeTensor
|
A NodeTensor representing the output of this expression. |
Source code in struct2tensor/expression_impl/promote.py
calculation_equal
¶calculation_equal(expr: Expression) -> bool
self.calculate is equal to another expression.calculate.
Given the same source node tensors, self.calculate(...) and expression.calculate(...) will have the same result.
Note that this does not check that the source expressions of the two expressions are the same. Therefore, two operations can have the same calculation, but not the same output, because their sources are different. For example, if a.calculation_is_identity() is True and b.calculation_is_identity() is True, then a.calculation_equal(b) is True. However, unless a and b have the same source, the expressions themselves are not equal.
PARAMETER | DESCRIPTION |
---|---|
expression |
The expression to compare to.
TYPE:
|
cogroup_by_index
¶cogroup_by_index(
source_path: CoercableToPath,
left_name: Step,
right_name: Step,
new_field_name: Step,
) -> Expression
Creates a cogroup of left_name and right_name at new_field_name.
Source code in struct2tensor/expression.py
create_has_field
¶create_has_field(
source_path: CoercableToPath, new_field_name: Step
) -> Expression
Creates a field that is the presence of the source path.
create_proto_index
¶create_proto_index(field_name: Step) -> Expression
Creates a proto index field as a direct child of the current root.
The proto index maps each root element to the original batch index. For example: [0, 2] means the first element came from the first proto in the original input tensor and the second element came from the third proto. The created field is always "dense" -- it has the same valency as the current root.
PARAMETER | DESCRIPTION |
---|---|
field_name |
the name of the field to be created.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Expression
|
An Expression object representing the result of the operation. |
Source code in struct2tensor/expression.py
create_size_field
¶create_size_field(
source_path: CoercableToPath, new_field_name: Step
) -> Expression
Creates a field that is the size of the source path.
get_child
¶get_child(field_name: Step) -> Optional[Expression]
Gets a named child.
Source code in struct2tensor/expression.py
get_descendant
¶get_descendant(p: Path) -> Optional[Expression]
Finds the descendant at the path.
Source code in struct2tensor/expression.py
get_descendant_or_error
¶get_descendant_or_error(p: Path) -> Expression
Finds the descendant at the path.
Source code in struct2tensor/expression.py
get_known_children
¶get_known_children() -> Mapping[Step, Expression]
get_known_descendants
¶get_known_descendants() -> Mapping[Path, Expression]
Gets a mapping from known paths to subexpressions.
The difference between this and get_descendants in Prensor is that all paths in a Prensor are realized, thus all known. But an Expression's descendants might not all be known at the point this method is called, because an expression may have an infinite number of children.
RETURNS | DESCRIPTION |
---|---|
Mapping[Path, Expression]
|
A mapping from paths (relative to the root of the subexpression) to expressions. |
Source code in struct2tensor/expression.py
get_paths_with_schema
¶Extract only paths that contain schema information.
Source code in struct2tensor/expression.py
get_schema
¶Returns a schema for the entire tree.
PARAMETER | DESCRIPTION |
---|---|
create_schema_features |
If True, schema features are added for all children and a schema entry is created if not available on the child. If False, features are left off of the returned schema if there is no schema_feature on the child.
DEFAULT:
|
Source code in struct2tensor/expression.py
get_source_expressions
¶get_source_expressions() -> Sequence[Expression]
Gets the sources of this expression.
The node tensors of the source expressions must be sufficient to calculate the node tensor of this expression (see calculate and calculate_value_slowly).
RETURNS | DESCRIPTION |
---|---|
Sequence[Expression]
|
The sources of this expression. |
known_field_names
¶Returns known field names of the expression.
TODO(martinz): implement set_field and project. Known field names of a parsed proto correspond to the fields declared in the message. Examples of "unknown" fields are extensions and explicit casts in an any field. The only way to know if an unknown field "(foo.bar)" is present in an expression expr is to call (expr["(foo.bar)"] is not None).
Notice that simply accessing a field does not make it "known". However, setting a field (or setting a descendant of a field) will make it known.
project(...) returns an expression where the known field names are the only field names. In general, if you want to depend upon known_field_names (e.g., if you want to compile a expression), then the best approach is to project() the expression first.
RETURNS | DESCRIPTION |
---|---|
FrozenSet[Step]
|
An immutable set of field names. |
map_field_values
¶map_field_values(
source_path: CoercableToPath,
operator: Callable[[Tensor], Tensor],
dtype: DType,
new_field_name: Step,
) -> Expression
Map a primitive field to create a new primitive field.
Note
The dtype argument is added since the v1 API.
PARAMETER | DESCRIPTION |
---|---|
source_path |
the origin path.
TYPE:
|
operator |
an element-wise operator that takes a 1-dimensional vector.
TYPE:
|
dtype |
the type of the output.
TYPE:
|
new_field_name |
the name of a new sibling of source_path.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Expression
|
the resulting root expression. |
Source code in struct2tensor/expression.py
map_ragged_tensors
¶map_ragged_tensors(
parent_path: CoercableToPath,
source_fields: Sequence[Step],
operator: Callable[..., SparseTensor],
is_repeated: bool,
dtype: DType,
new_field_name: Step,
) -> Expression
Maps a set of primitive fields of a message to a new field.
Unlike map_field_values, this operation allows you to some degree reshape the field. For instance, you can take two optional fields and create a repeated field, or perform a reduce_sum on the last dimension of a repeated field and create an optional field. The key constraint is that the operator must return a sparse tensor of the correct dimension: i.e., a 2D sparse tensor if is_repeated is true, or a 1D sparse tensor if is_repeated is false. Moreover, the first dimension of the sparse tensor must be equal to the first dimension of the input tensor.
PARAMETER | DESCRIPTION |
---|---|
parent_path |
the parent of the input and output fields.
TYPE:
|
source_fields |
the nonempty list of names of the source fields. |
operator |
an operator that takes len(source_fields) sparse tensors and returns a sparse tensor of the appropriate shape.
TYPE:
|
is_repeated |
whether the output is repeated.
TYPE:
|
dtype |
the dtype of the result.
TYPE:
|
new_field_name |
the name of the resulting field.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Expression
|
A new query. |
Source code in struct2tensor/expression.py
map_sparse_tensors
¶map_sparse_tensors(
parent_path: CoercableToPath,
source_fields: Sequence[Step],
operator: Callable[..., SparseTensor],
is_repeated: bool,
dtype: DType,
new_field_name: Step,
) -> Expression
Maps a set of primitive fields of a message to a new field.
Unlike map_field_values, this operation allows you to some degree reshape the field. For instance, you can take two optional fields and create a repeated field, or perform a reduce_sum on the last dimension of a repeated field and create an optional field. The key constraint is that the operator must return a sparse tensor of the correct dimension: i.e., a 2D sparse tensor if is_repeated is true, or a 1D sparse tensor if is_repeated is false. Moreover, the first dimension of the sparse tensor must be equal to the first dimension of the input tensor.
PARAMETER | DESCRIPTION |
---|---|
parent_path |
the parent of the input and output fields.
TYPE:
|
source_fields |
the nonempty list of names of the source fields. |
operator |
an operator that takes len(source_fields) sparse tensors and returns a sparse tensor of the appropriate shape.
TYPE:
|
is_repeated |
whether the output is repeated.
TYPE:
|
dtype |
the dtype of the result.
TYPE:
|
new_field_name |
the name of the resulting field.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Expression
|
A new query. |
Source code in struct2tensor/expression.py
project
¶project(path_list: Sequence[CoercableToPath]) -> Expression
promote
¶promote(source_path: CoercableToPath, new_field_name: Step)
Promotes source_path to be a field new_field_name in its grandparent.
promote_and_broadcast
¶promote_and_broadcast(
path_dictionary: Mapping[Step, CoercableToPath],
dest_path_parent: CoercableToPath,
) -> Expression
Source code in struct2tensor/expression.py
reroot
¶reroot(new_root: CoercableToPath) -> Expression
schema_string
¶Returns a schema for the expression.
For examle,
Note that unknown fields and subexpressions are not displayed.
PARAMETER | DESCRIPTION |
---|---|
limit |
if present, limit the recursion. |
RETURNS | DESCRIPTION |
---|---|
str
|
A string, describing (a part of) the schema. |
Source code in struct2tensor/expression.py
slice
¶slice(
source_path: CoercableToPath,
new_field_name: Step,
begin: Optional[IndexValue] = None,
end: Optional[IndexValue] = None,
) -> Expression
Creates a slice copy of source_path at new_field_path.
Note that if begin or end is negative, it is considered relative to the size of the array. e.g., slice(...,begin=-1) will get the last element of every array.
PARAMETER | DESCRIPTION |
---|---|
source_path |
the source of the slice.
TYPE:
|
new_field_name |
the new field that is generated.
TYPE:
|
begin |
the beginning of the slice (inclusive).
TYPE:
|
end |
the end of the slice (exclusive).
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Expression
|
An Expression object representing the result of the operation. |
Source code in struct2tensor/expression.py
truncate
¶truncate(
source_path: CoercableToPath,
limit: Union[int, Tensor],
new_field_name: Step,
) -> Expression
Creates a truncated copy of source_path at new_field_path.
Source code in struct2tensor/expression.py
PromoteExpression
¶
PromoteExpression(
origin: Expression, origin_parent: Expression
)
Bases: Leaf
A promoted leaf.
Initialize a Leaf.
Note that a leaf must have a specified type.
PARAMETER | DESCRIPTION |
---|---|
is_repeated |
if the expression is repeated.
TYPE:
|
my_type |
the DType of the field.
TYPE:
|
schema_feature |
schema information about the field.
TYPE:
|
Source code in struct2tensor/expression_impl/promote.py
is_repeated
property
¶is_repeated: bool
True iff the same parent value can have multiple children values.
apply
¶apply(
transform: Callable[[Expression], Expression]
) -> Expression
apply_schema
¶apply_schema(schema: Schema) -> Expression
broadcast
¶broadcast(
source_path: CoercableToPath,
sibling_field: Step,
new_field_name: Step,
) -> Expression
Broadcasts the existing field at source_path to the sibling_field.
Source code in struct2tensor/expression.py
calculate
¶calculate(
sources: Sequence[NodeTensor],
destinations: Sequence[Expression],
options: Options,
side_info: Optional[Prensor] = None,
) -> NodeTensor
Calculates the node tensor of the expression.
The node tensor must be a function of the properties of the expression and the node tensors of the expressions from get_source_expressions().
If is_leaf, then calculate must return a LeafNodeTensor. Otherwise, it must return a ChildNodeTensor or RootNodeTensor.
If calculate_is_identity is true, then this must return source_tensors[0].
Sometimes, for operations such as parsing the proto, calculate will return additional information. For example, calculate() for the root of the proto expression also parses out the tensors required to calculate the tensors of the children. This is why destinations are required.
For a reference use, see calculate_value_slowly(...) below.
PARAMETER | DESCRIPTION |
---|---|
source_tensors |
The node tensors of the expressions in get_source_expressions().
TYPE:
|
destinations |
The expressions that will use the output of this method.
TYPE:
|
options |
Options for the calculation.
TYPE:
|
side_info |
An optional prensor that is used to bind to a placeholder expression. |
RETURNS | DESCRIPTION |
---|---|
NodeTensor
|
A NodeTensor representing the output of this expression. |
Source code in struct2tensor/expression_impl/promote.py
calculation_equal
¶calculation_equal(expr: Expression) -> bool
self.calculate is equal to another expression.calculate.
Given the same source node tensors, self.calculate(...) and expression.calculate(...) will have the same result.
Note that this does not check that the source expressions of the two expressions are the same. Therefore, two operations can have the same calculation, but not the same output, because their sources are different. For example, if a.calculation_is_identity() is True and b.calculation_is_identity() is True, then a.calculation_equal(b) is True. However, unless a and b have the same source, the expressions themselves are not equal.
PARAMETER | DESCRIPTION |
---|---|
expression |
The expression to compare to.
TYPE:
|
cogroup_by_index
¶cogroup_by_index(
source_path: CoercableToPath,
left_name: Step,
right_name: Step,
new_field_name: Step,
) -> Expression
Creates a cogroup of left_name and right_name at new_field_name.
Source code in struct2tensor/expression.py
create_has_field
¶create_has_field(
source_path: CoercableToPath, new_field_name: Step
) -> Expression
Creates a field that is the presence of the source path.
create_proto_index
¶create_proto_index(field_name: Step) -> Expression
Creates a proto index field as a direct child of the current root.
The proto index maps each root element to the original batch index. For example: [0, 2] means the first element came from the first proto in the original input tensor and the second element came from the third proto. The created field is always "dense" -- it has the same valency as the current root.
PARAMETER | DESCRIPTION |
---|---|
field_name |
the name of the field to be created.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Expression
|
An Expression object representing the result of the operation. |
Source code in struct2tensor/expression.py
create_size_field
¶create_size_field(
source_path: CoercableToPath, new_field_name: Step
) -> Expression
Creates a field that is the size of the source path.
get_child
¶get_child(field_name: Step) -> Optional[Expression]
Gets a named child.
Source code in struct2tensor/expression.py
get_descendant
¶get_descendant(p: Path) -> Optional[Expression]
Finds the descendant at the path.
Source code in struct2tensor/expression.py
get_descendant_or_error
¶get_descendant_or_error(p: Path) -> Expression
Finds the descendant at the path.
Source code in struct2tensor/expression.py
get_known_children
¶get_known_children() -> Mapping[Step, Expression]
get_known_descendants
¶get_known_descendants() -> Mapping[Path, Expression]
Gets a mapping from known paths to subexpressions.
The difference between this and get_descendants in Prensor is that all paths in a Prensor are realized, thus all known. But an Expression's descendants might not all be known at the point this method is called, because an expression may have an infinite number of children.
RETURNS | DESCRIPTION |
---|---|
Mapping[Path, Expression]
|
A mapping from paths (relative to the root of the subexpression) to expressions. |
Source code in struct2tensor/expression.py
get_paths_with_schema
¶Extract only paths that contain schema information.
Source code in struct2tensor/expression.py
get_schema
¶Returns a schema for the entire tree.
PARAMETER | DESCRIPTION |
---|---|
create_schema_features |
If True, schema features are added for all children and a schema entry is created if not available on the child. If False, features are left off of the returned schema if there is no schema_feature on the child.
DEFAULT:
|
Source code in struct2tensor/expression.py
get_source_expressions
¶get_source_expressions() -> Sequence[Expression]
Gets the sources of this expression.
The node tensors of the source expressions must be sufficient to calculate the node tensor of this expression (see calculate and calculate_value_slowly).
RETURNS | DESCRIPTION |
---|---|
Sequence[Expression]
|
The sources of this expression. |
known_field_names
¶Returns known field names of the expression.
TODO(martinz): implement set_field and project. Known field names of a parsed proto correspond to the fields declared in the message. Examples of "unknown" fields are extensions and explicit casts in an any field. The only way to know if an unknown field "(foo.bar)" is present in an expression expr is to call (expr["(foo.bar)"] is not None).
Notice that simply accessing a field does not make it "known". However, setting a field (or setting a descendant of a field) will make it known.
project(...) returns an expression where the known field names are the only field names. In general, if you want to depend upon known_field_names (e.g., if you want to compile a expression), then the best approach is to project() the expression first.
RETURNS | DESCRIPTION |
---|---|
FrozenSet[Step]
|
An immutable set of field names. |
map_field_values
¶map_field_values(
source_path: CoercableToPath,
operator: Callable[[Tensor], Tensor],
dtype: DType,
new_field_name: Step,
) -> Expression
Map a primitive field to create a new primitive field.
Note
The dtype argument is added since the v1 API.
PARAMETER | DESCRIPTION |
---|---|
source_path |
the origin path.
TYPE:
|
operator |
an element-wise operator that takes a 1-dimensional vector.
TYPE:
|
dtype |
the type of the output.
TYPE:
|
new_field_name |
the name of a new sibling of source_path.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Expression
|
the resulting root expression. |
Source code in struct2tensor/expression.py
map_ragged_tensors
¶map_ragged_tensors(
parent_path: CoercableToPath,
source_fields: Sequence[Step],
operator: Callable[..., SparseTensor],
is_repeated: bool,
dtype: DType,
new_field_name: Step,
) -> Expression
Maps a set of primitive fields of a message to a new field.
Unlike map_field_values, this operation allows you to some degree reshape the field. For instance, you can take two optional fields and create a repeated field, or perform a reduce_sum on the last dimension of a repeated field and create an optional field. The key constraint is that the operator must return a sparse tensor of the correct dimension: i.e., a 2D sparse tensor if is_repeated is true, or a 1D sparse tensor if is_repeated is false. Moreover, the first dimension of the sparse tensor must be equal to the first dimension of the input tensor.
PARAMETER | DESCRIPTION |
---|---|
parent_path |
the parent of the input and output fields.
TYPE:
|
source_fields |
the nonempty list of names of the source fields. |
operator |
an operator that takes len(source_fields) sparse tensors and returns a sparse tensor of the appropriate shape.
TYPE:
|
is_repeated |
whether the output is repeated.
TYPE:
|
dtype |
the dtype of the result.
TYPE:
|
new_field_name |
the name of the resulting field.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Expression
|
A new query. |
Source code in struct2tensor/expression.py
map_sparse_tensors
¶map_sparse_tensors(
parent_path: CoercableToPath,
source_fields: Sequence[Step],
operator: Callable[..., SparseTensor],
is_repeated: bool,
dtype: DType,
new_field_name: Step,
) -> Expression
Maps a set of primitive fields of a message to a new field.
Unlike map_field_values, this operation allows you to some degree reshape the field. For instance, you can take two optional fields and create a repeated field, or perform a reduce_sum on the last dimension of a repeated field and create an optional field. The key constraint is that the operator must return a sparse tensor of the correct dimension: i.e., a 2D sparse tensor if is_repeated is true, or a 1D sparse tensor if is_repeated is false. Moreover, the first dimension of the sparse tensor must be equal to the first dimension of the input tensor.
PARAMETER | DESCRIPTION |
---|---|
parent_path |
the parent of the input and output fields.
TYPE:
|
source_fields |
the nonempty list of names of the source fields. |
operator |
an operator that takes len(source_fields) sparse tensors and returns a sparse tensor of the appropriate shape.
TYPE:
|
is_repeated |
whether the output is repeated.
TYPE:
|
dtype |
the dtype of the result.
TYPE:
|
new_field_name |
the name of the resulting field.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Expression
|
A new query. |
Source code in struct2tensor/expression.py
project
¶project(path_list: Sequence[CoercableToPath]) -> Expression
promote
¶promote(source_path: CoercableToPath, new_field_name: Step)
Promotes source_path to be a field new_field_name in its grandparent.
promote_and_broadcast
¶promote_and_broadcast(
path_dictionary: Mapping[Step, CoercableToPath],
dest_path_parent: CoercableToPath,
) -> Expression
Source code in struct2tensor/expression.py
reroot
¶reroot(new_root: CoercableToPath) -> Expression
schema_string
¶Returns a schema for the expression.
For examle,
Note that unknown fields and subexpressions are not displayed.
PARAMETER | DESCRIPTION |
---|---|
limit |
if present, limit the recursion. |
RETURNS | DESCRIPTION |
---|---|
str
|
A string, describing (a part of) the schema. |
Source code in struct2tensor/expression.py
slice
¶slice(
source_path: CoercableToPath,
new_field_name: Step,
begin: Optional[IndexValue] = None,
end: Optional[IndexValue] = None,
) -> Expression
Creates a slice copy of source_path at new_field_path.
Note that if begin or end is negative, it is considered relative to the size of the array. e.g., slice(...,begin=-1) will get the last element of every array.
PARAMETER | DESCRIPTION |
---|---|
source_path |
the source of the slice.
TYPE:
|
new_field_name |
the new field that is generated.
TYPE:
|
begin |
the beginning of the slice (inclusive).
TYPE:
|
end |
the end of the slice (exclusive).
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Expression
|
An Expression object representing the result of the operation. |
Source code in struct2tensor/expression.py
truncate
¶truncate(
source_path: CoercableToPath,
limit: Union[int, Tensor],
new_field_name: Step,
) -> Expression
Creates a truncated copy of source_path at new_field_path.
Source code in struct2tensor/expression.py
Functions¶
promote
¶
promote(
root: Expression, p: Path, new_field_name: Step
) -> Expression
Promote a path to be a child of its grandparent, and give it a name.
Source code in struct2tensor/expression_impl/promote.py
promote_anonymous
¶
promote_anonymous(
root: Expression, p: Path
) -> Tuple[Expression, Path]
Promote a path to be a new anonymous child of its grandparent.
Source code in struct2tensor/expression_impl/promote.py
Modules¶
promote_and_broadcast
¶
promote_and_broadcast a set of nodes.
For example, suppose an expr represents:
+
|
+-session* (stars indicate repeated)
|
+-event*
| |
| +-val*-int64
|
+-user_info? (question mark indicates optional)
|
+-age? int64
session: {
event: {
val: 1
}
event: {
val: 4
val: 5
}
user_info: {
age: 25
}
}
session: {
event: {
val: 7
}
event: {
val: 8
val: 9
}
user_info: {
age: 20
}
}
promote_and_broadcast.promote_and_broadcast(
path.Path(["event"]),{"nage":path.Path(["user_info","age"])})
creates:
+
|
+-session* (stars indicate repeated)
|
+-event*
| |
| +-val*-int64
| |
| +-nage*-int64
|
+-user_info? (question mark indicates optional)
|
+-age? int64
session: {
event: {
nage: 25
val: 1
}
event: {
nage: 25
val: 4
val: 5
}
user_info: {
age: 25
}
}
session: {
event: {
nage: 20
val: 7
}
event: {
nage: 20
val: 8
val: 9
}
user_info: {
age: 20
}
}
Functions¶
promote_and_broadcast
¶
promote_and_broadcast(
root: Expression,
path_dictionary: Mapping[Step, Path],
dest_path_parent: Path,
) -> Expression
Promote and broadcast a set of paths to a particular location.
PARAMETER | DESCRIPTION |
---|---|
root |
the original expression.
TYPE:
|
path_dictionary |
a map from destination fields to origin paths. |
dest_path_parent |
a map from destination strings.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Expression
|
A new expression, where all the origin paths are promoted and broadcast until they are children of dest_path_parent. |
Source code in struct2tensor/expression_impl/promote_and_broadcast.py
promote_and_broadcast_anonymous
¶
promote_and_broadcast_anonymous(
root: Expression, origin: Path, new_parent: Path
) -> Tuple[Expression, Path]
Promotes then broadcasts the origin until its parent is new_parent.
Source code in struct2tensor/expression_impl/promote_and_broadcast.py
Modules¶
proto
¶
Expressions to parse a proto.
These expressions return values with more information than standard node values. Specifically, each node calculates additional tensors that are used as inputs for its children.
Attributes¶
ProtoExpression
module-attribute
¶
ProtoExpression = Union[
_ProtoRootExpression,
_ProtoChildExpression,
_ProtoLeafExpression,
]
Functions¶
create_expression_from_file_descriptor_set
¶
create_expression_from_file_descriptor_set(
tensor_of_protos: Tensor,
proto_name: ProtoFullName,
file_descriptor_set: FileDescriptorSet,
message_format: str = "binary",
) -> Expression
Create an expression from a 1D tensor of serialized protos.
PARAMETER | DESCRIPTION |
---|---|
tensor_of_protos |
1D tensor of serialized protos.
TYPE:
|
proto_name |
fully qualified name (e.g. "some.package.SomeProto") of the
proto in
TYPE:
|
file_descriptor_set |
The FileDescriptorSet proto containing
TYPE:
|
message_format |
Indicates the format of the protocol buffer: is one of 'text' or 'binary'.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Expression
|
An expression. |
Source code in struct2tensor/expression_impl/proto.py
create_expression_from_proto
¶
create_expression_from_proto(
tensor_of_protos: Tensor,
desc: Descriptor,
message_format: str = "binary",
) -> Expression
Create an expression from a 1D tensor of serialized protos.
PARAMETER | DESCRIPTION |
---|---|
tensor_of_protos |
1D tensor of serialized protos.
TYPE:
|
desc |
a descriptor of protos in tensor of protos.
TYPE:
|
message_format |
Indicates the format of the protocol buffer: is one of 'text' or 'binary'.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Expression
|
An expression. |
Source code in struct2tensor/expression_impl/proto.py
create_transformed_field
¶
create_transformed_field(
expr: Expression,
source_path: CoercableToPath,
dest_field: StrStep,
transform_fn: TransformFn,
) -> Expression
Create an expression that transforms serialized proto tensors.
The transform_fn argument should take the form:
def transform_fn(parent_indices, values): ... return (transformed_parent_indices, transformed_values)
Given:
- parent_indices: an int64 vector of non-decreasing parent message indices.
- values: a string vector of serialized protos having the same shape as
parent_indices
.
transform_fn
must return new parent indices and serialized values encoding
the same proto message as the passed in values
. These two vectors must
have the same size, but it need not be the same as the input arguments.
Note
If CalculateOptions.use_string_view (set at calculate time, thus this
Expression cannot know beforehand) is True, values
passed to
transform_fn
are string views pointing all the way back to the original
input tensor (of serialized root protos). And transform_fn
must maintain
such views and avoid creating new values that are either not string views
into the root protos or self-owned strings. This is because downstream
decoding ops will still produce string views referring into its input
(which are string views into the root proto) and they will only hold a
reference to the original, root proto tensor, keeping it alive. So the input
tensor may get destroyed after the decoding op.
In short, you can do element-wise transforms to values
, but can't mutate
the contents of elements in values
or create new elements.
To lift this restriction, a decoding op must be told to hold a reference of the input tensors of all its upstream decoding ops.
PARAMETER | DESCRIPTION |
---|---|
expr |
a source expression containing
TYPE:
|
source_path |
the path to the field to reverse.
TYPE:
|
dest_field |
the name of the newly created field. This field will be a
sibling of the field identified by
TYPE:
|
transform_fn |
a callable that accepts parent_indices and serialized proto values and returns a posibly modified parent_indices and values. Note that when CalcuateOptions.use_string_view is set, transform_fn should not have any stateful side effecting uses of serialized proto inputs. Doing so could cause segfaults as the backing string tensor lifetime is not guaranteed when the side effecting operations are run.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Expression
|
An expression. |
RAISES | DESCRIPTION |
---|---|
ValueError
|
if the source path is not a proto message field. |
Source code in struct2tensor/expression_impl/proto.py
116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 |
|
is_proto_expression
¶
is_proto_expression(expr: Expression) -> bool
Returns true if an expression is a ProtoExpression.
Modules¶
reroot
¶
Reroot to a subtree, maintaining an input proto index.
reroot is similar to get_descendant_or_error. However, this method allows you to call create_proto_index(...) later on, that gives you a reference to the original proto.
Functions¶
create_proto_index_field
¶
create_proto_index_field(
root: Expression, new_field_name: Step
) -> Expression
reroot
¶
reroot(root: Expression, source_path: Path) -> Expression
Reroot to a new path, maintaining a input proto index.
Similar to root.get_descendant_or_error(source_path): however, this method retains the ability to get a map to the original index.
PARAMETER | DESCRIPTION |
---|---|
root |
the original root.
TYPE:
|
source_path |
the path to the new root.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Expression
|
the new root. |
Source code in struct2tensor/expression_impl/reroot.py
Modules¶
size
¶
Functions for creating new size or has expression.
Given a field "foo.bar",
creates a new expression root that has an optional field "foo.bar_size", which is always present, and contains the number of bar in a particular foo.
creates a new expression root that has an optional field "foo.bar_has", which is always present, and is true if there are one or more bar in foo.
Classes¶
SizeExpression
¶
SizeExpression(
origin: Expression, origin_parent: Expression
)
Bases: Leaf
Size of the given expression.
SizeExpression is intended to be a sibling of origin. origin_parent should be the parent of origin.
Initialize a Leaf.
Note that a leaf must have a specified type.
PARAMETER | DESCRIPTION |
---|---|
is_repeated |
if the expression is repeated.
TYPE:
|
my_type |
the DType of the field.
TYPE:
|
schema_feature |
schema information about the field.
TYPE:
|
Source code in struct2tensor/expression_impl/size.py
is_repeated
property
¶is_repeated: bool
True iff the same parent value can have multiple children values.
apply
¶apply(
transform: Callable[[Expression], Expression]
) -> Expression
apply_schema
¶apply_schema(schema: Schema) -> Expression
broadcast
¶broadcast(
source_path: CoercableToPath,
sibling_field: Step,
new_field_name: Step,
) -> Expression
Broadcasts the existing field at source_path to the sibling_field.
Source code in struct2tensor/expression.py
calculate
¶calculate(
sources: Sequence[NodeTensor],
destinations: Sequence[Expression],
options: Options,
side_info: Optional[Prensor] = None,
) -> NodeTensor
Calculates the node tensor of the expression.
The node tensor must be a function of the properties of the expression and the node tensors of the expressions from get_source_expressions().
If is_leaf, then calculate must return a LeafNodeTensor. Otherwise, it must return a ChildNodeTensor or RootNodeTensor.
If calculate_is_identity is true, then this must return source_tensors[0].
Sometimes, for operations such as parsing the proto, calculate will return additional information. For example, calculate() for the root of the proto expression also parses out the tensors required to calculate the tensors of the children. This is why destinations are required.
For a reference use, see calculate_value_slowly(...) below.
PARAMETER | DESCRIPTION |
---|---|
source_tensors |
The node tensors of the expressions in get_source_expressions().
TYPE:
|
destinations |
The expressions that will use the output of this method.
TYPE:
|
options |
Options for the calculation.
TYPE:
|
side_info |
An optional prensor that is used to bind to a placeholder expression. |
RETURNS | DESCRIPTION |
---|---|
NodeTensor
|
A NodeTensor representing the output of this expression. |
Source code in struct2tensor/expression_impl/size.py
calculation_equal
¶calculation_equal(expr: Expression) -> bool
self.calculate is equal to another expression.calculate.
Given the same source node tensors, self.calculate(...) and expression.calculate(...) will have the same result.
Note that this does not check that the source expressions of the two expressions are the same. Therefore, two operations can have the same calculation, but not the same output, because their sources are different. For example, if a.calculation_is_identity() is True and b.calculation_is_identity() is True, then a.calculation_equal(b) is True. However, unless a and b have the same source, the expressions themselves are not equal.
PARAMETER | DESCRIPTION |
---|---|
expression |
The expression to compare to.
TYPE:
|
cogroup_by_index
¶cogroup_by_index(
source_path: CoercableToPath,
left_name: Step,
right_name: Step,
new_field_name: Step,
) -> Expression
Creates a cogroup of left_name and right_name at new_field_name.
Source code in struct2tensor/expression.py
create_has_field
¶create_has_field(
source_path: CoercableToPath, new_field_name: Step
) -> Expression
Creates a field that is the presence of the source path.
create_proto_index
¶create_proto_index(field_name: Step) -> Expression
Creates a proto index field as a direct child of the current root.
The proto index maps each root element to the original batch index. For example: [0, 2] means the first element came from the first proto in the original input tensor and the second element came from the third proto. The created field is always "dense" -- it has the same valency as the current root.
PARAMETER | DESCRIPTION |
---|---|
field_name |
the name of the field to be created.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Expression
|
An Expression object representing the result of the operation. |
Source code in struct2tensor/expression.py
create_size_field
¶create_size_field(
source_path: CoercableToPath, new_field_name: Step
) -> Expression
Creates a field that is the size of the source path.
get_child
¶get_child(field_name: Step) -> Optional[Expression]
Gets a named child.
Source code in struct2tensor/expression.py
get_descendant
¶get_descendant(p: Path) -> Optional[Expression]
Finds the descendant at the path.
Source code in struct2tensor/expression.py
get_descendant_or_error
¶get_descendant_or_error(p: Path) -> Expression
Finds the descendant at the path.
Source code in struct2tensor/expression.py
get_known_children
¶get_known_children() -> Mapping[Step, Expression]
get_known_descendants
¶get_known_descendants() -> Mapping[Path, Expression]
Gets a mapping from known paths to subexpressions.
The difference between this and get_descendants in Prensor is that all paths in a Prensor are realized, thus all known. But an Expression's descendants might not all be known at the point this method is called, because an expression may have an infinite number of children.
RETURNS | DESCRIPTION |
---|---|
Mapping[Path, Expression]
|
A mapping from paths (relative to the root of the subexpression) to expressions. |
Source code in struct2tensor/expression.py
get_paths_with_schema
¶Extract only paths that contain schema information.
Source code in struct2tensor/expression.py
get_schema
¶Returns a schema for the entire tree.
PARAMETER | DESCRIPTION |
---|---|
create_schema_features |
If True, schema features are added for all children and a schema entry is created if not available on the child. If False, features are left off of the returned schema if there is no schema_feature on the child.
DEFAULT:
|
Source code in struct2tensor/expression.py
get_source_expressions
¶get_source_expressions() -> Sequence[Expression]
Gets the sources of this expression.
The node tensors of the source expressions must be sufficient to calculate the node tensor of this expression (see calculate and calculate_value_slowly).
RETURNS | DESCRIPTION |
---|---|
Sequence[Expression]
|
The sources of this expression. |
known_field_names
¶Returns known field names of the expression.
TODO(martinz): implement set_field and project. Known field names of a parsed proto correspond to the fields declared in the message. Examples of "unknown" fields are extensions and explicit casts in an any field. The only way to know if an unknown field "(foo.bar)" is present in an expression expr is to call (expr["(foo.bar)"] is not None).
Notice that simply accessing a field does not make it "known". However, setting a field (or setting a descendant of a field) will make it known.
project(...) returns an expression where the known field names are the only field names. In general, if you want to depend upon known_field_names (e.g., if you want to compile a expression), then the best approach is to project() the expression first.
RETURNS | DESCRIPTION |
---|---|
FrozenSet[Step]
|
An immutable set of field names. |
map_field_values
¶map_field_values(
source_path: CoercableToPath,
operator: Callable[[Tensor], Tensor],
dtype: DType,
new_field_name: Step,
) -> Expression
Map a primitive field to create a new primitive field.
Note
The dtype argument is added since the v1 API.
PARAMETER | DESCRIPTION |
---|---|
source_path |
the origin path.
TYPE:
|
operator |
an element-wise operator that takes a 1-dimensional vector.
TYPE:
|
dtype |
the type of the output.
TYPE:
|
new_field_name |
the name of a new sibling of source_path.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Expression
|
the resulting root expression. |
Source code in struct2tensor/expression.py
map_ragged_tensors
¶map_ragged_tensors(
parent_path: CoercableToPath,
source_fields: Sequence[Step],
operator: Callable[..., SparseTensor],
is_repeated: bool,
dtype: DType,
new_field_name: Step,
) -> Expression
Maps a set of primitive fields of a message to a new field.
Unlike map_field_values, this operation allows you to some degree reshape the field. For instance, you can take two optional fields and create a repeated field, or perform a reduce_sum on the last dimension of a repeated field and create an optional field. The key constraint is that the operator must return a sparse tensor of the correct dimension: i.e., a 2D sparse tensor if is_repeated is true, or a 1D sparse tensor if is_repeated is false. Moreover, the first dimension of the sparse tensor must be equal to the first dimension of the input tensor.
PARAMETER | DESCRIPTION |
---|---|
parent_path |
the parent of the input and output fields.
TYPE:
|
source_fields |
the nonempty list of names of the source fields. |
operator |
an operator that takes len(source_fields) sparse tensors and returns a sparse tensor of the appropriate shape.
TYPE:
|
is_repeated |
whether the output is repeated.
TYPE:
|
dtype |
the dtype of the result.
TYPE:
|
new_field_name |
the name of the resulting field.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Expression
|
A new query. |
Source code in struct2tensor/expression.py
map_sparse_tensors
¶map_sparse_tensors(
parent_path: CoercableToPath,
source_fields: Sequence[Step],
operator: Callable[..., SparseTensor],
is_repeated: bool,
dtype: DType,
new_field_name: Step,
) -> Expression
Maps a set of primitive fields of a message to a new field.
Unlike map_field_values, this operation allows you to some degree reshape the field. For instance, you can take two optional fields and create a repeated field, or perform a reduce_sum on the last dimension of a repeated field and create an optional field. The key constraint is that the operator must return a sparse tensor of the correct dimension: i.e., a 2D sparse tensor if is_repeated is true, or a 1D sparse tensor if is_repeated is false. Moreover, the first dimension of the sparse tensor must be equal to the first dimension of the input tensor.
PARAMETER | DESCRIPTION |
---|---|
parent_path |
the parent of the input and output fields.
TYPE:
|
source_fields |
the nonempty list of names of the source fields. |
operator |
an operator that takes len(source_fields) sparse tensors and returns a sparse tensor of the appropriate shape.
TYPE:
|
is_repeated |
whether the output is repeated.
TYPE:
|
dtype |
the dtype of the result.
TYPE:
|
new_field_name |
the name of the resulting field.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Expression
|
A new query. |
Source code in struct2tensor/expression.py
project
¶project(path_list: Sequence[CoercableToPath]) -> Expression
promote
¶promote(source_path: CoercableToPath, new_field_name: Step)
Promotes source_path to be a field new_field_name in its grandparent.
promote_and_broadcast
¶promote_and_broadcast(
path_dictionary: Mapping[Step, CoercableToPath],
dest_path_parent: CoercableToPath,
) -> Expression
Source code in struct2tensor/expression.py
reroot
¶reroot(new_root: CoercableToPath) -> Expression
schema_string
¶Returns a schema for the expression.
For examle,
Note that unknown fields and subexpressions are not displayed.
PARAMETER | DESCRIPTION |
---|---|
limit |
if present, limit the recursion. |
RETURNS | DESCRIPTION |
---|---|
str
|
A string, describing (a part of) the schema. |
Source code in struct2tensor/expression.py
slice
¶slice(
source_path: CoercableToPath,
new_field_name: Step,
begin: Optional[IndexValue] = None,
end: Optional[IndexValue] = None,
) -> Expression
Creates a slice copy of source_path at new_field_path.
Note that if begin or end is negative, it is considered relative to the size of the array. e.g., slice(...,begin=-1) will get the last element of every array.
PARAMETER | DESCRIPTION |
---|---|
source_path |
the source of the slice.
TYPE:
|
new_field_name |
the new field that is generated.
TYPE:
|
begin |
the beginning of the slice (inclusive).
TYPE:
|
end |
the end of the slice (exclusive).
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Expression
|
An Expression object representing the result of the operation. |
Source code in struct2tensor/expression.py
truncate
¶truncate(
source_path: CoercableToPath,
limit: Union[int, Tensor],
new_field_name: Step,
) -> Expression
Creates a truncated copy of source_path at new_field_path.
Source code in struct2tensor/expression.py
Functions¶
has
¶
has(
root: Expression,
source_path: Path,
new_field_name: Step,
) -> Expression
Get the has of a field as a new sibling field.
PARAMETER | DESCRIPTION |
---|---|
root |
the original expression.
TYPE:
|
source_path |
the source path to measure. Cannot be root.
TYPE:
|
new_field_name |
the name of the sibling field.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Expression
|
The new expression. |
Source code in struct2tensor/expression_impl/size.py
size
¶
size(
root: Expression,
source_path: Path,
new_field_name: Step,
) -> Expression
Get the size of a field as a new sibling field.
PARAMETER | DESCRIPTION |
---|---|
root |
the original expression.
TYPE:
|
source_path |
the source path to measure. Cannot be root.
TYPE:
|
new_field_name |
the name of the sibling field.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Expression
|
The new expression. |
Source code in struct2tensor/expression_impl/size.py
size_anonymous
¶
size_anonymous(
root: Expression, source_path: Path
) -> Tuple[Expression, Path]
Calculate the size of a field, and store it as an anonymous sibling.
PARAMETER | DESCRIPTION |
---|---|
root |
the original expression.
TYPE:
|
source_path |
the source path to measure. Cannot be root.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Tuple[Expression, Path]
|
The new expression and the new field as a pair. |
Source code in struct2tensor/expression_impl/size.py
Modules¶
slice_expression
¶
Implementation of slice.
The slice operation is meant to replicate the slicing of a list in python.
Slicing a list in python is done by specifying a beginning and ending. The resulting list consists of all elements in the range.
For example:
>>> x = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
>>> print(x[2:5]) # all elements between index 2 inclusive and index 5 exclusive
['c', 'd', 'e']
>>> print(x[2:]) # all elements between index 2 and the end.
['c', 'd', 'e', 'f', 'g']
>>> print(x[:4]) # all elements between the beginning and index 4 (exclusive).
['a', 'b', 'c', 'd']
>>> print(x[-3:-1]) # all elements starting three from the end.
>>> # until one from the end (exclusive).
['e', 'f']
>>> print(x[-3:6]) # all elements starting three from the end
# until index 6 exclusive.
['e', 'f', 'g']
TODO(martinz): there is a third argument to slice, which allows one to step over the elements (e.g. x[2:6:2]=['c', 'e'], giving you every other element. This is not implemented.
A prensor can be considered to be interleaved lists and dictionaries. E.g.:
my_expression = [{
"foo":[
{"bar":[
{"baz":["a","b","c", "d"]},
{"baz":["d","e","f"]}
]
},
{"bar":[
{"baz":["g","h","i"]},
{"baz":["j","k","l", ]}
{"baz":["m"]}
]
}]
}]
result_1 = slice_expression.slice_expression(
my_expression, "foo.bar", "new_bar",begin=1, end=3)
result_1 = [{
"foo":[
{"bar":[
{"baz":["a","b","c", "d"]},
{"baz":["d","e","f"]}
],
"new_bar":[
{"baz":["d","e","f"]}
]
},
{"bar":[
{"baz":["g","h","i"]},
{"baz":["j","k","l", ]}
{"baz":["m", ]}
],
"new_bar":[
{"baz":["j","k","l", ]}
{"baz":["m", ]}
]
}]
}]
result_2 = slice_expression.slice_expression(
my_expression, "foo.bar.baz", "new_baz",begin=1, end=3)
result_2 = [{
"foo":[
{"bar":[
{"baz":["a","b","c", "d"],
"new_baz":["b","c"],
},
{"baz":["d","e","f"], "new_baz":["e","f"]}
]
},
{"bar":[
{"baz":["g","h","i"], "new_baz":["h","i"]},
{"baz":["j","k","l"], "new_baz":["k","l"]},
{"baz":["m", ]}
]
}]
}]
Attributes¶
Functions¶
slice_expression
¶
slice_expression(
expr: Expression,
p: Path,
new_field_name: Step,
begin: Optional[IndexValue],
end: Optional[IndexValue],
) -> Expression
Creates a new subtree with a sliced expression.
This follows the pattern of python slice() method. See module-level comments for examples.
PARAMETER | DESCRIPTION |
---|---|
expr |
the original root expression
TYPE:
|
p |
the path to the source to be sliced.
TYPE:
|
new_field_name |
the name of the new subtree.
TYPE:
|
begin |
beginning index
TYPE:
|
end |
end index.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Expression
|
A new root expression. |