Skip to content

expression_impl

struct2tensor.expression_impl

Import all modules in expression_impl.

The modules in this file should be accessed like the following:

import struct2tensor as s2t
from struct2tensor import expression_impl

s2t.expression_impl.apply_schema

Modules

apply_schema

Apply a schema to an expression.

A tensorflow metadata schema (TODO(martinz): link) represents more detailed information about the data: specifically, it presents domain information (e.g., not just integers, but integers between 0 and 10), and more detailed structural information (e.g., this field occurs in at least 70% of its parents, and when it occurs, it shows up 5 to 7 times).

Applying a schema attaches a tensorflow metadata schema to an expression: namely, it aligns the features in the schema with the expression's children by name (possibly recursively).

After applying a schema to an expression, one can use promote, broadcast, et cetera, and the schema for new expressions will be inferred. If you write a custom expression, you can write code that determines the schema information of the result.

To get the schema back, call get_schema().

This does not filter out fields not in the schema.

my_expr = ...
my_schema = # ...schema here...
my_new_schema = my_expr.apply_schema(my_schema).get_schema()
# my_new_schema has semantically identical information on the fields as my_schema.

TODO(martinz): Add utilities to:

  1. Get the (non-deprecated) paths from a schema.
  2. Check if any paths in the schema are not in the expression.
  3. Check if any paths in the expression are not in the schema.
  4. Project the expression to paths in the schema.
Functions
apply_schema
apply_schema(
    expr: Expression, schema: Schema
) -> Expression
Source code in struct2tensor/expression_impl/apply_schema.py
def apply_schema(expr: expression.Expression,
                 schema: schema_pb2.Schema) -> expression.Expression:
  schema_copy = schema_pb2.Schema()
  schema_copy.CopyFrom(schema)
  for x in schema_copy.feature:
    _normalize_feature(x, schema_copy)
  return _SchemaExpression(expr, schema_copy.feature, None)
Modules

broadcast

Methods for broadcasting a path in a tree.

This provides methods for broadcasting a field anonymously (that is used in promote_and_broadcast), or with an explicitly given name.

Suppose you have an expr representing:

+
|
+-session*   (stars indicate repeated)
     |
     +-event*
     |
     +-val*-int64
session: {
  event: {}
  event: {}
  val: 10
  val: 11
}
session: {
  event: {}
  event: {}
  val: 20
}

Then:

broadcast.broadcast(expr, path.Path(["session","val"]), "event", "nv")

becomes:

+
|
+---session*   (stars indicate repeated)
       |
       +-event*
       |   |
       |   +---nv*-int64
       |
       +-val*-int64
session: {
  event: {
    nv: 10
    nv:11
  }
  event: {
    nv: 10
    nv:11
  }
  val: 10
  val: 11
}
session: {
  event: {nv: 20}
  event: {nv: 20}
  val: 20
}
Functions
broadcast
broadcast(
    root: Expression,
    origin: Path,
    sibling_name: Step,
    new_field_name: Step,
) -> Expression
Source code in struct2tensor/expression_impl/broadcast.py
def broadcast(root: expression.Expression, origin: path.Path,
              sibling_name: path.Step,
              new_field_name: path.Step) -> expression.Expression:
  return _broadcast_impl(root, origin, sibling_name, new_field_name)[0]
broadcast_anonymous
broadcast_anonymous(
    root: Expression, origin: Path, sibling: Step
) -> Tuple[Expression, Path]
Source code in struct2tensor/expression_impl/broadcast.py
def broadcast_anonymous(
    root: expression.Expression, origin: path.Path,
    sibling: path.Step) -> Tuple[expression.Expression, path.Path]:
  return _broadcast_impl(root, origin, sibling, path.get_anonymous_field())
Modules

depth_limit

Caps the depth of an expression.

Suppose you have an expression expr modeled as:

  *
   \
    A
   / \
  D   B
       \
        C

if expr_2 = depth_limit.limit_depth(expr, 2) You get:

  *
   \
    A
   / \
  D   B
Functions
limit_depth
limit_depth(
    expr: Expression, depth_limit: int
) -> Expression

Limit the depth to nodes k steps from expr.

Source code in struct2tensor/expression_impl/depth_limit.py
def limit_depth(expr: expression.Expression,
                depth_limit: int) -> expression.Expression:
  """Limit the depth to nodes k steps from expr."""
  return _DepthLimitExpression(expr, depth_limit)
Modules

filter_expression

Create a new expression that is a filtered version of an original one.

There are two public methods in this module: filter_by_sibling and filter_by_child. As with most other operations, these create a new tree which has all the original paths of the original tree, but with a new subtree.

filter_by_sibling allows you to filter an expression by a boolean sibling field.

Beginning with the struct:

root =
         -----*----------------------------------------------------
        /                       \                                  \
     root0                    root1-----------------------      root2 (empty)
      /   \                   /    \               \      \
      |  keep_my_sib0:False  |  keep_my_sib1:True   | keep_my_sib2:False
    doc0-----               doc1---------------    doc2--------
     |       \                \           \    \               \
    bar:"a"  keep_me:False    bar:"b" bar:"c" keep_me:True      bar:"d"

# Note, keep_my_sib and doc must have the same shape (e.g., each root
has the same number of keep_my_sib children as doc children).
root_2 = filter_expression.filter_by_sibling(
    root, path.create_path("doc"), "keep_my_sib", "new_doc")

End with the struct (suppressing original doc):
         -----*----------------------------------------------------
        /                       \                                  \
    root0                    root1------------------        root2 (empty)
        \                   /    \                  \
        keep_my_sib0:False  |  keep_my_sib1:True   keep_my_sib2:False
                           new_doc0-----------
                             \           \    \
                             bar:"b" bar:"c" keep_me:True

filter_by_sibling allows you to filter an expression by a optional boolean child field.

The following call will have the same effect as above:

root_2 = filter_expression.filter_by_child(
    root, path.create_path("doc"), "keep_me", "new_doc")
Functions
filter_by_child
filter_by_child(
    expr: Expression,
    p: Path,
    child_field_name: Step,
    new_field_name: Step,
) -> Expression

Filter an expression by an optional boolean child field.

If the child field is present and True, then keep that parent. Otherwise, drop the parent.

PARAMETER DESCRIPTION
expr

the original expression

TYPE: Expression

p

the path to filter.

TYPE: Path

child_field_name

the boolean child field to use to filter.

TYPE: Step

new_field_name

the new, filtered version of path.

TYPE: Step

RETURNS DESCRIPTION
Expression

The new root expression.

Source code in struct2tensor/expression_impl/filter_expression.py
def filter_by_child(expr: expression.Expression, p: path.Path,
                    child_field_name: path.Step,
                    new_field_name: path.Step) -> expression.Expression:
  """Filter an expression by an optional boolean child field.

  If the child field is present and True, then keep that parent.
  Otherwise, drop the parent.

  Args:
    expr: the original expression
    p: the path to filter.
    child_field_name: the boolean child field to use to filter.
    new_field_name: the new, filtered version of path.

  Returns:
    The new root expression.
  """
  origin = expr.get_descendant_or_error(p)
  child = origin.get_child_or_error(child_field_name)
  new_expr = _FilterByChildExpression(origin, child)
  new_path = p.get_parent().get_child(new_field_name)

  return expression_add.add_paths(expr, {new_path: new_expr})
filter_by_sibling
filter_by_sibling(
    expr: Expression,
    p: Path,
    sibling_field_name: Step,
    new_field_name: Step,
) -> Expression

Filter an expression by its sibling.

This is similar to boolean_mask. The shape of the path being filtered and the sibling must be identical (e.g., each parent object must have an equal number of source and sibling children).

PARAMETER DESCRIPTION
expr

the root expression.

TYPE: Expression

p

a path to the source to be filtered.

TYPE: Path

sibling_field_name

the sibling to use as a mask.

TYPE: Step

new_field_name

a new sibling to create.

TYPE: Step

RETURNS DESCRIPTION
Expression

a new root.

Source code in struct2tensor/expression_impl/filter_expression.py
def filter_by_sibling(expr: expression.Expression, p: path.Path,
                      sibling_field_name: path.Step,
                      new_field_name: path.Step) -> expression.Expression:
  """Filter an expression by its sibling.


  This is similar to boolean_mask. The shape of the path being filtered and
  the sibling must be identical (e.g., each parent object must have an
  equal number of source and sibling children).

  Args:
    expr: the root expression.
    p: a path to the source to be filtered.
    sibling_field_name: the sibling to use as a mask.
    new_field_name: a new sibling to create.

  Returns:
    a new root.
  """
  origin = expr.get_descendant_or_error(p)
  parent_path = p.get_parent()
  sibling = expr.get_descendant_or_error(
      parent_path.get_child(sibling_field_name))
  new_expr = _FilterBySiblingExpression(origin, sibling)
  new_path = parent_path.get_child(new_field_name)
  return expression_add.add_paths(expr, {new_path: new_expr})
Modules

index

get_positional_index and get_index_from_end methods.

The parent_index identifies the index of the parent of each element. These methods take the parent_index to determine the relationship with respect to other elements.

Given:

session: {
  event: {
    val: 111
  }
  event: {
    val: 121
    val: 122
  }
}

session: {
  event: {
    val: 10
    val: 7
  }
  event: {
    val: 1
  }
}
get_positional_index(expr, path.Path(["event","val"]), "val_index")

yields:

session: {
  event: {
    val: 111
    val_index: 0
  }
  event: {
    val: 121
    val: 122
    val_index: 0
    val_index: 1
  }
}

session: {
  event: {
    val: 10
    val: 7
    val_index: 0
    val_index: 1
  }
  event: {
    val: 1
    val_index: 0
  }
}

get_index_from_end(expr, path.Path(["event","val"]), "neg_val_index")
yields:
session: {
  event: {
    val: 111
    neg_val_index: -1
  }
  event: {
    val: 121
    val: 122
    neg_val_index: -2
    neg_val_index: -1
  }
}

session: {
  event: {
    val: 10
    val: 7
    neg_val_index: 2
    neg_val_index: -1
  }
  event: {
    val: 1
    neg_val_index: -1
  }
}

These methods are useful when you want to depend upon the index of a field. For example, if you want to filter examples based upon their index, or cogroup two fields by index, then first creating the index is useful.

Note that while the parent indices of these fields seem like overhead, they are just references to the parent indices of other fields, and are therefore take little memory or CPU.

Functions
get_index_from_end
get_index_from_end(
    t: Expression, source_path: Path, new_field_name: Step
) -> Tuple[Expression, Path]

Gets the number of steps from the end of the array.

Given an array ["a", "b", "c"], with indices [0, 1, 2], the result of this is [-3,-2,-1].

PARAMETER DESCRIPTION
t

original expression

TYPE: Expression

source_path

path in expression to get index of.

TYPE: Path

new_field_name

the name of the new field.

TYPE: Step

RETURNS DESCRIPTION
Tuple[Expression, Path]

The new expression and the new path as a pair.

Source code in struct2tensor/expression_impl/index.py
def get_index_from_end(t: expression.Expression, source_path: path.Path,
                       new_field_name: path.Step
                      ) -> Tuple[expression.Expression, path.Path]:
  """Gets the number of steps from the end of the array.

  Given an array ["a", "b", "c"], with indices [0, 1, 2], the result of this
  is [-3,-2,-1].

  Args:
    t: original expression
    source_path: path in expression to get index of.
    new_field_name: the name of the new field.

  Returns:
    The new expression and the new path as a pair.
  """
  new_path = source_path.get_parent().get_child(new_field_name)
  work_expr, positional_index_path = get_positional_index(
      t, source_path, path.get_anonymous_field())
  work_expr, size_path = size.size_anonymous(work_expr, source_path)
  work_expr = expression_add.add_paths(
      work_expr, {
          new_path:
              _PositionalIndexFromEndExpression(
                  work_expr.get_descendant_or_error(positional_index_path),
                  work_expr.get_descendant_or_error(size_path))
      })
  # Removing the intermediate anonymous nodes.
  result = expression_add.add_to(t, {new_path: work_expr})
  return result, new_path
get_positional_index
get_positional_index(
    expr: Expression,
    source_path: Path,
    new_field_name: Step,
) -> Tuple[Expression, Path]

Gets the positional index.

Given a field with parent_index [0,1,1,2,3,4,4], this returns: parent_index [0,1,1,2,3,4,4] and value [0,0,1,0,0,0,1]

PARAMETER DESCRIPTION
expr

original expression

TYPE: Expression

source_path

path in expression to get index of.

TYPE: Path

new_field_name

the name of the new field.

TYPE: Step

RETURNS DESCRIPTION
Tuple[Expression, Path]

The new expression and the new path as a pair.

Source code in struct2tensor/expression_impl/index.py
def get_positional_index(expr: expression.Expression, source_path: path.Path,
                         new_field_name: path.Step
                        ) -> Tuple[expression.Expression, path.Path]:
  """Gets the positional index.

  Given a field with parent_index [0,1,1,2,3,4,4], this returns:
  parent_index [0,1,1,2,3,4,4] and value [0,0,1,0,0,0,1]

  Args:
    expr: original expression
    source_path: path in expression to get index of.
    new_field_name: the name of the new field.

  Returns:
    The new expression and the new path as a pair.
  """
  new_path = source_path.get_parent().get_child(new_field_name)
  return expression_add.add_paths(
      expr, {
          new_path:
              _PositionalIndexExpression(
                  expr.get_descendant_or_error(source_path))
      }), new_path
Modules

map_prensor

Arbitrary operations from sparse and ragged tensors to a leaf field.

There are two public methods of note right now: map_sparse_tensor and map_ragged_tensor.

Assume expr is:

session: {
  event: {
    val_a: 10
    val_b: 1
  }
  event: {
    val_a: 20
    val_b: 2
  }
  event: {
  }
  event: {
    val_a: 40
  }
  event: {
    val_b: 5
  }
}

Either of the following alternatives will add val_a and val_b to create val_sum.

map_sparse_tensor converts val_a and val_b to sparse tensors, and then add them to produce val_sum.

new_root = map_prensor.map_sparse_tensor(
    expr,
    path.Path(["event"]),
    [path.Path(["val_a"]), path.Path(["val_b"])],
    lambda x,y: x + y,
    False,
    tf.int32,
    "val_sum")

map_ragged_tensor converts val_a and val_b to ragged tensors, and then add them to produce val_sum.

new_root = map_prensor.map_ragged_tensor(
    expr,
    path.Path(["event"]),
    [path.Path(["val_a"]), path.Path(["val_b"])],
    lambda x,y: x + y,
    False,
    tf.int32,
    "val_sum")

The result of either is:

session: {
  event: {
    val_a: 10
    val_b: 1
    val_sum: 11
  }
  event: {
    val_a: 20
    val_b: 2
    val_sum: 22
  }
  event: {
  }
  event: {
    val_a: 40
    val_sum: 40
  }
  event: {
    val_b: 5
    val_sum: 5
  }
}
Functions
map_ragged_tensor
map_ragged_tensor(
    root: Expression,
    root_path: Path,
    paths: Sequence[Path],
    operation: Callable[..., RaggedTensor],
    is_repeated: bool,
    dtype: DType,
    new_field_name: Step,
) -> Expression

Map a ragged tensor.

PARAMETER DESCRIPTION
root

the root of the expression.

TYPE: Expression

root_path

the path relative to which the ragged tensors are calculated.

TYPE: Path

paths

the input paths relative to the root_path

TYPE: Sequence[Path]

operation

a method that takes the list of ragged tensors as input and returns a ragged tensor.

TYPE: Callable[..., RaggedTensor]

is_repeated

true if the result of operation is repeated.

TYPE: bool

dtype

dtype of the result of the operation.

TYPE: DType

new_field_name

root_path.get_child(new_field_name) is the path of the result.

TYPE: Step

RETURNS DESCRIPTION
Expression

A new root expression containing the old root expression plus the new path, root_path.get_child(new_field_name), with the result of the operation.

Source code in struct2tensor/expression_impl/map_prensor.py
def map_ragged_tensor(root: expression.Expression, root_path: path.Path,
                      paths: Sequence[path.Path],
                      operation: Callable[..., tf.RaggedTensor],
                      is_repeated: bool, dtype: tf.DType,
                      new_field_name: path.Step) -> expression.Expression:
  """Map a ragged tensor.

  Args:
    root: the root of the expression.
    root_path: the path relative to which the ragged tensors are calculated.
    paths: the input paths relative to the root_path
    operation: a method that takes the list of ragged tensors as input and
      returns a ragged tensor.
    is_repeated: true if the result of operation is repeated.
    dtype: dtype of the result of the operation.
    new_field_name: root_path.get_child(new_field_name) is the path of the
      result.

  Returns:
    A new root expression containing the old root expression plus the new path,
      root_path.get_child(new_field_name), with the result of the operation.
  """
  return _map_ragged_tensor_impl(root, root_path, paths, operation, is_repeated,
                                 dtype, new_field_name)[0]
map_sparse_tensor
map_sparse_tensor(
    root: Expression,
    root_path: Path,
    paths: Sequence[Path],
    operation: Callable[..., SparseTensor],
    is_repeated: bool,
    dtype: DType,
    new_field_name: Step,
) -> Expression

Maps a sparse tensor.

PARAMETER DESCRIPTION
root

the root of the expression.

TYPE: Expression

root_path

the path relative to which the sparse tensors are calculated.

TYPE: Path

paths

the input paths relative to the root_path

TYPE: Sequence[Path]

operation

a method that takes the list of sparse tensors as input and returns a sparse tensor.

TYPE: Callable[..., SparseTensor]

is_repeated

true if the result of operation is repeated.

TYPE: bool

dtype

dtype of the result of the operation.

TYPE: DType

new_field_name

root_path.get_child(new_field_name) is the path of the result.

TYPE: Step

RETURNS DESCRIPTION
Expression

A new root expression containing the old root expression plus the new path, root_path.get_child(new_field_name), with the result of the operation.

Source code in struct2tensor/expression_impl/map_prensor.py
def map_sparse_tensor(root: expression.Expression, root_path: path.Path,
                      paths: Sequence[path.Path],
                      operation: Callable[..., tf.SparseTensor],
                      is_repeated: bool, dtype: tf.DType,
                      new_field_name: path.Step) -> expression.Expression:
  """Maps a sparse tensor.

  Args:
    root: the root of the expression.
    root_path: the path relative to which the sparse tensors are calculated.
    paths: the input paths relative to the root_path
    operation: a method that takes the list of sparse tensors as input and
      returns a sparse tensor.
    is_repeated: true if the result of operation is repeated.
    dtype: dtype of the result of the operation.
    new_field_name: root_path.get_child(new_field_name) is the path of the
      result.

  Returns:
    A new root expression containing the old root expression plus the new path,
      root_path.get_child(new_field_name), with the result of the operation.
  """

  return _map_sparse_tensor_impl(root, root_path, paths, operation, is_repeated,
                                 dtype, new_field_name)[0]
Modules

map_prensor_to_prensor

Arbitrary operations from prensors to prensors in an expression.

This is useful if a single op generates an entire structure. In general, it is better to use the existing expressions framework or design a custom expression than use this op. So long as any of the output is required, all of the input is required.

For example, suppose you have an op my_op, that takes a prensor of the form:

  event
   / \
 foo   bar

and produces a prensor of the form my_result_schema:

   event
    / \
 foo2 bar2
my_result_schema = create_schema(
    is_repeated=True,
    children={"foo2":{is_repeated:True, dtype:tf.int64},
              "bar2":{is_repeated:False, dtype:tf.int64}})

If you give it an expression original with the schema:

 session
    |
  event
  /  \
foo   bar
result = map_prensor_to_prensor(
  original,
  path.Path(["session","event"]),
  my_op,
  my_result_schema)

Result will have the schema:

 session
    |
  event--------
  /  \    \    \
foo   bar foo2 bar2
Classes
Schema
Schema(
    is_repeated: bool = True,
    dtype: Optional[DType] = None,
    schema_feature: Optional[Feature] = None,
    children: Optional[Dict[Step, Schema]] = None,
)

Bases: object

A finite schema for a prensor.

Effectively, this stores everything for the prensor but the tensors themselves.

Notice that this is slightly different than schema_pb2.Schema, although similar in nature. At present, there is no clear way to extract is_repeated and dtype from schema_pb2.Schema.

See create_schema below for constructing a schema.

Note that for LeafNodeTensor, dtype is not None. Also, for ChildNodeTensor and RootNodeTensor, dtype is None. However, a ChildNodeTensor or RootNodeTensor could be childless.

Create a new Schema object.

PARAMETER DESCRIPTION
is_repeated

is the root repeated?

TYPE: bool DEFAULT: True

dtype

tf.dtype of the root if the root is a leaf, otherwise None.

TYPE: Optional[DType] DEFAULT: None

schema_feature

schema_pb2.Feature of the root (no struct_domain necessary)

TYPE: Optional[Feature] DEFAULT: None

children

child schemas.

TYPE: Optional[Dict[Step, Schema]] DEFAULT: None

Source code in struct2tensor/expression_impl/map_prensor_to_prensor.py
def __init__(self,
             is_repeated: bool = True,
             dtype: Optional[tf.DType] = None,
             schema_feature: Optional[schema_pb2.Feature] = None,
             children: Optional[Dict[path.Step, "Schema"]] = None):
  """Create a new Schema object.

  Args:
    is_repeated: is the root repeated?
    dtype: tf.dtype of the root if the root is a leaf, otherwise None.
    schema_feature: schema_pb2.Feature of the root (no struct_domain
      necessary)
    children: child schemas.
  """
  self._is_repeated = is_repeated
  self._type = dtype
  self._schema_feature = schema_feature
  self._children = children if children is not None else {}
  # Cannot have a type and children.
  assert (self._type is None or not self._children)
Attributes
is_repeated property
is_repeated: bool
schema_feature property
schema_feature: Optional[Feature]
type property
type: Optional[DType]
Functions
get_child
get_child(key: Step)
Source code in struct2tensor/expression_impl/map_prensor_to_prensor.py
def get_child(self, key: path.Step):
  return self._children[key]
known_field_names
known_field_names() -> FrozenSet[Step]
Source code in struct2tensor/expression_impl/map_prensor_to_prensor.py
def known_field_names(self) -> FrozenSet[path.Step]:
  return frozenset(self._children.keys())
Functions
create_schema
create_schema(
    is_repeated: bool = True,
    dtype: Optional[DType] = None,
    schema_feature: Optional[Feature] = None,
    children: Optional[Dict[Step, Any]] = None,
) -> Schema

Create a schema recursively.

Example

my_result_schema = create_schema(
  is_repeated=True,
  children={"foo2":{is_repeated=True, dtype=tf.int64},
            "bar2":{is_repeated=False, dtype=tf.int64}})
PARAMETER DESCRIPTION
is_repeated

whether the root is repeated.

TYPE: bool DEFAULT: True

dtype

the dtype of a leaf (None for non-leaves).

TYPE: Optional[DType] DEFAULT: None

schema_feature

the schema_pb2.Feature describing this expression. name and struct_domain need not be specified.

TYPE: Optional[Feature] DEFAULT: None

children

the child schemas. Note that the value type of children is either a Schema or a dictionary of arguments to create_schema.

TYPE: Optional[Dict[Step, Any]] DEFAULT: None

RETURNS DESCRIPTION
Schema

a new Schema represented by the inputs.

Source code in struct2tensor/expression_impl/map_prensor_to_prensor.py
def create_schema(is_repeated: bool = True,
                  dtype: Optional[tf.DType] = None,
                  schema_feature: Optional[schema_pb2.Feature] = None,
                  children: Optional[Dict[path.Step, Any]] = None) -> Schema:
  """Create a schema recursively.

  !!! Example
      ```python
      my_result_schema = create_schema(
        is_repeated=True,
        children={"foo2":{is_repeated=True, dtype=tf.int64},
                  "bar2":{is_repeated=False, dtype=tf.int64}})
      ```

  Args:
    is_repeated: whether the root is repeated.
    dtype: the dtype of a leaf (None for non-leaves).
    schema_feature: the schema_pb2.Feature describing this expression. name and
      struct_domain need not be specified.
    children: the child schemas. Note that the value type of children is either
      a Schema or a dictionary of arguments to create_schema.

  Returns:
    a new Schema represented by the inputs.
  """
  children_dict = children or {}
  child_schemas = {
      k: _create_schema_helper(v) for k, v in children_dict.items()
  }
  return Schema(
      is_repeated=is_repeated,
      dtype=dtype,
      schema_feature=schema_feature,
      children=child_schemas)
map_prensor_to_prensor
map_prensor_to_prensor(
    root_expr: Expression,
    source: Path,
    paths_needed: Sequence[Path],
    prensor_op: Callable[[Prensor], Prensor],
    output_schema: Schema,
) -> Expression

Maps an expression to a prensor, and merges that prensor.

For example, suppose you have an op my_op, that takes a prensor of the form:

  event
  /  \
foo  bar

and produces a prensor of the form my_result_schema:

  event
  /  \
foo2 bar2

If you give it an expression original with the schema:

 session
    |
  event
  /  \
foo   bar
result = map_prensor_to_prensor(
  original,
  path.Path(["session","event"]),
  my_op,
  my_output_schema)

Result will have the schema:

 session
    |
  event--------
  /  \    \    \
foo   bar foo2 bar2
PARAMETER DESCRIPTION
root_expr

the root expression

TYPE: Expression

source

the path where the prensor op is applied.

TYPE: Path

paths_needed

the paths needed for the op.

TYPE: Sequence[Path]

prensor_op

the prensor op

TYPE: Callable[[Prensor], Prensor]

output_schema

the output schema of the op.

TYPE: Schema

RETURNS DESCRIPTION
Expression

A new expression where the prensor is merged.

Source code in struct2tensor/expression_impl/map_prensor_to_prensor.py
def map_prensor_to_prensor(
    root_expr: expression.Expression, source: path.Path,
    paths_needed: Sequence[path.Path],
    prensor_op: Callable[[prensor.Prensor], prensor.Prensor],
    output_schema: Schema) -> expression.Expression:
  r"""Maps an expression to a prensor, and merges that prensor.

  For example, suppose you have an op my_op, that takes a prensor of the form:

  ```
    event
    /  \
  foo  bar
  ```

  and produces a prensor of the form my_result_schema:

  ```
    event
    /  \
  foo2 bar2
  ```

  If you give it an expression original with the schema:

  ```
   session
      |
    event
    /  \
  foo   bar
  ```
  ```python
  result = map_prensor_to_prensor(
    original,
    path.Path(["session","event"]),
    my_op,
    my_output_schema)
  ```

  Result will have the schema:

  ```
   session
      |
    event--------
    /  \    \    \
  foo   bar foo2 bar2
  ```

  Args:
    root_expr: the root expression
    source: the path where the prensor op is applied.
    paths_needed: the paths needed for the op.
    prensor_op: the prensor op
    output_schema: the output schema of the op.

  Returns:
    A new expression where the prensor is merged.
  """
  original_child = root_expr.get_descendant_or_error(source).project(
      paths_needed)
  prensor_child = _PrensorOpExpression(original_child, prensor_op,
                                       output_schema)
  paths_map = {
      source.get_child(k): prensor_child.get_child_or_error(k)
      for k in prensor_child.known_field_names()
  }
  result = expression_add.add_paths(root_expr, paths_map)
  return result
Modules

map_values

Maps the values of various leaves of the same child to a single result.

All inputs must have the same shape (parent_index must be equal).

The output is given the same shape (output of function must be of equal length).

Note that the operations are on 1-D tensors (as opposed to scalars).

Functions
map_many_values
map_many_values(
    root: Expression,
    parent_path: Path,
    source_fields: Sequence[Step],
    operation: Callable[..., Tensor],
    dtype: DType,
    new_field_name: Step,
) -> Tuple[Expression, Path]

Map multiple sibling fields into a new sibling.

All source fields must have the same shape, and the shape of the output must be the same as well.

PARAMETER DESCRIPTION
root

original root.

TYPE: Expression

parent_path

parent path of all sources and the new field.

TYPE: Path

source_fields

source fields of the operation. Must have the same shape.

TYPE: Sequence[Step]

operation

operation from source_fields to new field.

TYPE: Callable[..., Tensor]

dtype

type of new field.

TYPE: DType

new_field_name

name of the new field.

TYPE: Step

RETURNS DESCRIPTION
Tuple[Expression, Path]

The new expression and the new path as a pair.

Source code in struct2tensor/expression_impl/map_values.py
def map_many_values(
    root: expression.Expression, parent_path: path.Path,
    source_fields: Sequence[path.Step], operation: Callable[..., tf.Tensor],
    dtype: tf.DType,
    new_field_name: path.Step) -> Tuple[expression.Expression, path.Path]:
  """Map multiple sibling fields into a new sibling.

  All source fields must have the same shape, and the shape of the output
  must be the same as well.

  Args:
    root: original root.
    parent_path: parent path of all sources and the new field.
    source_fields: source fields of the operation. Must have the same shape.
    operation: operation from source_fields to new field.
    dtype: type of new field.
    new_field_name: name of the new field.

  Returns:
    The new expression and the new path as a pair.
  """
  new_path = parent_path.get_child(new_field_name)
  return expression_add.add_paths(
      root, {
          new_path:
              _MapValuesExpression([
                  root.get_descendant_or_error(parent_path.get_child(f))
                  for f in source_fields
              ], operation, dtype)
      }), new_path
map_values
map_values(
    root: Expression,
    source_path: Path,
    operation: Callable[[Tensor], Tensor],
    dtype: DType,
    new_field_name: Step,
) -> Expression

Map field into a new sibling.

The shape of the output must be the same as the input.

PARAMETER DESCRIPTION
root

original root.

TYPE: Expression

source_path

source of the operation.

TYPE: Path

operation

operation from source_fields to new field.

TYPE: Callable[[Tensor], Tensor]

dtype

type of new field.

TYPE: DType

new_field_name

name of the new field.

TYPE: Step

RETURNS DESCRIPTION
Expression

The new expression.

Source code in struct2tensor/expression_impl/map_values.py
def map_values(root: expression.Expression, source_path: path.Path,
               operation: Callable[[tf.Tensor], tf.Tensor], dtype: tf.DType,
               new_field_name: path.Step) -> expression.Expression:
  """Map field into a new sibling.

  The shape of the output must be the same as the input.

  Args:
    root: original root.
    source_path: source of the operation.
    operation: operation from source_fields to new field.
    dtype: type of new field.
    new_field_name: name of the new field.

  Returns:
    The new expression.
  """
  if not source_path:
    raise ValueError('Cannot map the root.')
  return map_many_values(root, source_path.get_parent(),
                         [source_path.field_list[-1]], operation, dtype,
                         new_field_name)[0]
map_values_anonymous
map_values_anonymous(
    root: Expression,
    source_path: Path,
    operation: Callable[[Tensor], Tensor],
    dtype: DType,
) -> Tuple[Expression, Path]

Map field into a new sibling.

The shape of the output must be the same as the input.

PARAMETER DESCRIPTION
root

original root.

TYPE: Expression

source_path

source of the operation.

TYPE: Path

operation

operation from source_fields to new field.

TYPE: Callable[[Tensor], Tensor]

dtype

type of new field.

TYPE: DType

RETURNS DESCRIPTION
Tuple[Expression, Path]

The new expression and the new path as a pair.

Source code in struct2tensor/expression_impl/map_values.py
def map_values_anonymous(
    root: expression.Expression, source_path: path.Path,
    operation: Callable[[tf.Tensor], tf.Tensor],
    dtype: tf.DType) -> Tuple[expression.Expression, path.Path]:
  """Map field into a new sibling.

  The shape of the output must be the same as the input.

  Args:
    root: original root.
    source_path: source of the operation.
    operation: operation from source_fields to new field.
    dtype: type of new field.

  Returns:
    The new expression and the new path as a pair.
  """
  if not source_path:
    raise ValueError('Cannot map the root.')
  return map_many_values(root, source_path.get_parent(),
                         [source_path.field_list[-1]], operation, dtype,
                         path.get_anonymous_field())
Modules

parquet

Apache Parquet Dataset.

Example Usage

exp = create_expression_from_parquet_file(filenames)
docid_project_exp = project.project(exp, [path.Path(["DocId"])])
pqds = parquet_dataset.calculate_parquet_values([docid_project_exp], exp,
                                                filenames, batch_size)

for prensors in pqds:
  doc_id_prensor = prensors[0]
Classes
ParquetDataset
ParquetDataset(
    filenames: List[str],
    value_paths: List[str],
    batch_size: int,
)

Bases: _RawParquetDataset

A dataset which reads columns from a parquet file and returns a prensor.

The prensor will have a PrensorTypeSpec, which is created based on value_paths.

Note

In tensorflow v1 this dataset will not return a prensor. The output will be the same format as _RawParquetDataset's output (a vector of tensors). The following is a workaround in v1:

pq_ds = ParquetDataset(...)
type_spec = pq_ds.element_spec
tensors = pq_ds.make_one_shot_iterator().get_next()
prensor = type_spec.from_components(tensors)
session.run(prensor)

Creates a ParquetDataset.

PARAMETER DESCRIPTION
filenames

A list containing the name(s) of the file(s) to be read.

TYPE: List[str]

value_paths

A list of strings of the dotstring path(s) of each leaf path(s).

TYPE: List[str]

batch_size

An int that determines how many messages are parsed into one prensor tree in an iteration. If there are fewer than batch_size remaining messages, then all remaining messages will be returned.

TYPE: int

RAISES DESCRIPTION
ValueError

if the column does not exist in the parquet schema.

Source code in struct2tensor/expression_impl/parquet.py
def __init__(self, filenames: List[str], value_paths: List[str],
             batch_size: int):
  """Creates a ParquetDataset.

  Args:
    filenames: A list containing the name(s) of the file(s) to be read.
    value_paths: A list of strings of the dotstring path(s) of each leaf
      path(s).
    batch_size: An int that determines how many messages are parsed into one
      prensor tree in an iteration. If there are fewer than batch_size
      remaining messages, then all remaining messages will be returned.

  Raises:
    ValueError: if the column does not exist in the parquet schema.
  """
  self._filenames = filenames
  self._value_paths = value_paths
  self._batch_size = batch_size

  for filename in filenames:
    self._validate_file(filename, value_paths)

  self._value_dtypes = self._get_column_dtypes(filenames[0], value_paths)

  self._parent_index_paths = []
  self._path_index = []

  self.element_structure = self._create_prensor_spec()
  self._create_parent_index_paths_and_index_from_type_spec(
      self.element_structure, 0, 0)

  super(ParquetDataset,
        self).__init__(filenames, self._value_paths, self._value_dtypes,
                       self._parent_index_paths, self._path_index, batch_size)
Attributes
element_spec property
element_spec
element_structure instance-attribute
element_structure = _create_prensor_spec()
output_classes property
output_classes
output_shapes property
output_shapes
output_types property
output_types
Functions
Functions
calculate_parquet_values
calculate_parquet_values(
    expressions: List[Expression],
    root_exp: _PlaceholderRootExpression,
    filenames: List[str],
    batch_size: int,
    options: Optional[Options] = None,
)

Calculates expressions and returns a parquet dataset.

PARAMETER DESCRIPTION
expressions

A list of expressions to calculate.

TYPE: List[Expression]

root_exp

The root placeholder expression to use as the feed dict.

TYPE: _PlaceholderRootExpression

filenames

A list of parquet files.

TYPE: List[str]

batch_size

The number of messages to batch.

TYPE: int

options

calculate options.

TYPE: Optional[Options] DEFAULT: None

RETURNS DESCRIPTION

A parquet dataset.

Source code in struct2tensor/expression_impl/parquet.py
def calculate_parquet_values(
    expressions: List[expression.Expression],
    root_exp: placeholder._PlaceholderRootExpression,  # pylint: disable=protected-access
    filenames: List[str],
    batch_size: int,
    options: Optional[calculate_options.Options] = None):
  """Calculates expressions and returns a parquet dataset.

  Args:
    expressions: A list of expressions to calculate.
    root_exp: The root placeholder expression to use as the feed dict.
    filenames: A list of parquet files.
    batch_size: The number of messages to batch.
    options: calculate options.

  Returns:
    A parquet dataset.
  """
  pqds = _ParquetDatasetWithExpression(expressions, root_exp, filenames,
                                       batch_size, options)
  return pqds.map(pqds._calculate_prensor)  # pylint: disable=protected-access
create_expression_from_parquet_file
create_expression_from_parquet_file(
    filenames: List[str],
) -> _PlaceholderRootExpression

Creates a placeholder expression from a parquet file.

PARAMETER DESCRIPTION
filenames

A list of parquet files.

TYPE: List[str]

RETURNS DESCRIPTION
_PlaceholderRootExpression

A PlaceholderRootExpression that should be used as the root of an expression graph.

Source code in struct2tensor/expression_impl/parquet.py
def create_expression_from_parquet_file(
    filenames: List[str]) -> placeholder._PlaceholderRootExpression:  # pylint: disable=protected-access
  """Creates a placeholder expression from a parquet file.

  Args:
    filenames: A list of parquet files.

  Returns:
    A PlaceholderRootExpression that should be used as the root of an expression
      graph.
  """

  metadata = pq.ParquetFile(filenames[0]).metadata
  parquet_schema = metadata.schema
  arrow_schema = parquet_schema.to_arrow_schema()

  root_schema = mpp.create_schema(
      is_repeated=True,
      children=_create_children_from_arrow_fields(
          [arrow_schema.field_by_name(name) for name in arrow_schema.names]))

  # pylint: disable=protected-access
  return placeholder._PlaceholderRootExpression(root_schema)
Modules

parse_message_level_ex

Parses regular fields, extensions, any casts, and map protos.

This is intended for use within proto.py, not independently.

parse_message_level(...) in struct2tensor_ops provides a direct interface to parsing a protocol buffer message. In particular, extensions and regular fields can be directly extracted from the protobuf. However, prensors provide other syntactic sugar to parse protobufs, and parse_message_level_ex(...) handles these in addition to regular fields and extensions.

Specifically, consider google.protobuf.Any and proto maps:

package foo.bar;

message MyMessage {
  Any my_any = 1;
  map<string, Baz> my_map = 2;
}
message Baz {
  int32 my_int = 1;
  ...
}

Then for MyMessage, the path my_any.(foo.bar.Baz).my_int is an optional path. Also, my_map[x].my_int is an optional path.

  MyMessage--------------
     \  my_any?          \ my_map[x]
      *                   *
       \  (foo.bar.Baz)?   \  my_int?
        *                   *
         \  my_int?
          *

Thus, we can run:

my_message_serialized_tensor = ...

my_message_parsed = parse_message_level_ex(
    my_message_serialized_tensor,
    MyMessage.DESCRIPTOR,
    {"my_any", "my_map[x]"})

my_any_serialized = my_message_parsed["my_any"].value

my_any_parsed = parse_message_level_ex(
    my_any_serialized,
    Any.DESCRIPTOR,
    {"(foo.bar.Baz)"})

At this point, my_message_parsed["my_map[x]"].value AND my_any_parsed["(foo.bar.Baz)"].value are serialized Baz tensors.

Attributes
ProtoFieldName module-attribute
ProtoFieldName = str
ProtoFullName module-attribute
ProtoFullName = str
StrStep module-attribute
StrStep = str
Functions
get_full_name_from_any_step
get_full_name_from_any_step(
    step: ProtoFieldName,
) -> Optional[ProtoFieldName]

Gets the full name of a protobuf from a google.protobuf.Any step.

An any step is of the form (foo.com/bar.Baz). In this case the result would be bar.Baz.

PARAMETER DESCRIPTION
step

the string of a step in a path.

TYPE: ProtoFieldName

RETURNS DESCRIPTION
Optional[ProtoFieldName]

the full name of a protobuf if the step is an any step, or None otherwise.

Source code in struct2tensor/expression_impl/parse_message_level_ex.py
def get_full_name_from_any_step(
    step: ProtoFieldName) -> Optional[ProtoFieldName]:
  """Gets the full name of a protobuf from a google.protobuf.Any step.

  An any step is of the form (foo.com/bar.Baz). In this case the result would
  be bar.Baz.

  Args:
    step: the string of a step in a path.

  Returns:
    the full name of a protobuf if the step is an any step, or None otherwise.
  """
  if not step:
    return None
  if step[0] != "(":
    return None
  if step[-1] != ")":
    return None
  step_without_parens = step[1:-1]
  return step_without_parens.split("/")[-1]
is_any_descriptor
is_any_descriptor(desc: Descriptor) -> bool

Returns true if it is an Any descriptor.

Source code in struct2tensor/expression_impl/parse_message_level_ex.py
def is_any_descriptor(desc: descriptor.Descriptor) -> bool:
  """Returns true if it is an Any descriptor."""
  return desc.full_name == "google.protobuf.Any"
parse_message_level_ex
parse_message_level_ex(
    tensor_of_protos: Tensor,
    desc: Descriptor,
    field_names: Set[ProtoFieldName],
    message_format: str = "binary",
    backing_str_tensor: Optional[Tensor] = None,
    honor_proto3_optional_semantics: bool = False,
) -> Mapping[StrStep, _ParsedField]

Parses regular fields, extensions, any casts, and map protos.

Source code in struct2tensor/expression_impl/parse_message_level_ex.py
def parse_message_level_ex(
    tensor_of_protos: tf.Tensor,
    desc: descriptor.Descriptor,
    field_names: Set[ProtoFieldName],
    message_format: str = "binary",
    backing_str_tensor: Optional[tf.Tensor] = None,
    honor_proto3_optional_semantics: bool = False
) -> Mapping[StrStep, struct2tensor_ops._ParsedField]:
  """Parses regular fields, extensions, any casts, and map protos."""
  raw_field_names = _get_field_names_to_parse(desc, field_names)
  regular_fields = list(
      struct2tensor_ops.parse_message_level(
          tensor_of_protos,
          desc,
          raw_field_names,
          message_format=message_format,
          backing_str_tensor=backing_str_tensor,
          honor_proto3_optional_semantics=honor_proto3_optional_semantics))
  regular_field_map = {x.field_name: x for x in regular_fields}

  any_fields = _get_any_parsed_fields(desc, regular_field_map, field_names)
  map_fields = _get_map_parsed_fields(desc, regular_field_map, field_names,
                                      backing_str_tensor)
  result = regular_field_map
  result.update(any_fields)
  result.update(map_fields)
  return result
Modules

placeholder

Placeholder expression.

A placeholder expression represents prensor nodes, however a prensor is not needed until calculate is called. This allows the user to apply expression queries to a placeholder expression before having an actual prensor object. When calculate is called on a placeholder expression (or a descendant of a placeholder expression), the feed_dict will need to be passed in. Then calculate will bind the prensor with the appropriate placeholder expression.

Sample usage:

placeholder_exp = placeholder.create_expression_from_schema(schema)
new_exp = expression_queries(placeholder_exp, ..)
result = calculate.calculate_values([new_exp],
                                    feed_dict={placeholder_exp: pren})
# placeholder_exp requires a feed_dict to be passed in when calculating
Functions
create_expression_from_schema
create_expression_from_schema(
    schema: Schema,
) -> _PlaceholderRootExpression

Creates a placeholder expression from a parquet schema.

PARAMETER DESCRIPTION
schema

The schema that describes the prensor tree that this placeholder represents.

TYPE: Schema

RETURNS DESCRIPTION
_PlaceholderRootExpression

A PlaceholderRootExpression that should be used as the root of an expression graph.

Source code in struct2tensor/expression_impl/placeholder.py
def create_expression_from_schema(
    schema: mpp.Schema) -> "_PlaceholderRootExpression":
  """Creates a placeholder expression from a parquet schema.

  Args:
    schema: The schema that describes the prensor tree that this placeholder
      represents.

  Returns:
    A PlaceholderRootExpression that should be used as the root of an expression
      graph.
  """

  return _PlaceholderRootExpression(schema)
get_placeholder_paths_from_graph
get_placeholder_paths_from_graph(
    graph: ExpressionGraph,
) -> List[Path]

Gets all placeholder paths from an expression graph.

This finds all leaf placeholder expressions in an expression graph, and gets the path of these expressions.

PARAMETER DESCRIPTION
graph

expression graph

TYPE: ExpressionGraph

RETURNS DESCRIPTION
List[Path]

a list of paths of placeholder expressions

Source code in struct2tensor/expression_impl/placeholder.py
def get_placeholder_paths_from_graph(
    graph: calculate.ExpressionGraph) -> List[path.Path]:
  """Gets all placeholder paths from an expression graph.

  This finds all leaf placeholder expressions in an expression graph, and gets
  the path of these expressions.

  Args:
    graph: expression graph

  Returns:
    a list of paths of placeholder expressions
  """
  expressions = [
      x for x in graph.get_expressions_needed()
      if (_is_placeholder_expression(x) and x.is_leaf)
  ]
  expressions = typing.cast(List[_PlaceholderExpression], expressions)
  return [e.get_path() for e in expressions]
Modules

project

project selects a subtree of an expression.

project is often used right before calculating the value.

Example

expr = ...
new_expr = project.project(expr, [path.Path(["foo","bar"]),
                                  path.Path(["x", "y"])])
[prensor_result] = calculate.calculate_prensors([new_expr])

prensor_result now has two paths, "foo.bar" and "x.y".

Functions
project
project(
    expr: Expression, paths: Sequence[Path]
) -> Expression

select a subtree.

Paths not selected are removed. Paths that are selected are "known", such that if calculate_prensors is called, they will be in the result.

PARAMETER DESCRIPTION
expr

the original expression.

TYPE: Expression

paths

the paths to include.

TYPE: Sequence[Path]

RETURNS DESCRIPTION
Expression

A projected expression.

Source code in struct2tensor/expression_impl/project.py
def project(expr: expression.Expression,
            paths: Sequence[path.Path]) -> expression.Expression:
  """select a subtree.

  Paths not selected are removed.
  Paths that are selected are "known", such that if calculate_prensors is
  called, they will be in the result.

  Args:
    expr: the original expression.
    paths: the paths to include.

  Returns:
    A projected expression.
  """
  missing_paths = [p for p in paths if expr.get_descendant(p) is None]
  if missing_paths:
    raise ValueError("{} Path(s) missing in project: {}".format(
        len(missing_paths), ", ".join([str(x) for x in missing_paths])))
  return _ProjectExpression(expr, paths)
Modules

promote

Promote an expression to be a child of its grandparent.

Promote is part of the standard flattening of data, promote_and_broadcast, which takes structured data and flattens it. By directly accessing promote, one can perform simpler operations.

For example, suppose an expr represents:

+
|
+-session*   (stars indicate repeated)
     |
     +-event*
         |
         +-val*-int64
session: {
  event: {
    val: 111
  }
  event: {
    val: 121
    val: 122
  }
}

session: {
  event: {
    val: 10
    val: 7
  }
  event: {
    val: 1
  }
}
promote.promote(expr, path.Path(["session", "event", "val"]), nval)

produces:

+
|
+-session*   (stars indicate repeated)
     |
     +-event*
     |    |
     |    +-val*-int64
     |
     +-nval*-int64
session: {
  event: {
    val: 111
  }
  event: {
    val: 121
    val: 122
  }
  nval: 111
  nval: 121
  nval: 122
}

session: {
  event: {
    val: 10
    val: 7
  }
  event: {
    val: 1
  }
  nval: 10
  nval: 7
  nval: 1
}
Classes
PromoteChildExpression
PromoteChildExpression(
    origin: Expression, origin_parent: Expression
)

Bases: Expression

The root of the promoted sub tree.

Initialize an expression.

PARAMETER DESCRIPTION
is_repeated

if the expression is repeated.

TYPE: bool

my_type

the DType of a field, or None for an internal node.

TYPE: Optional[DType]

schema_feature

the local schema (StructDomain information should not be present).

TYPE: Optional[Feature] DEFAULT: None

validate_step_format

If True, validates that steps do not have any characters that could be ambiguously understood as structure delimiters (e.g. "."). If False, such characters are allowed and the client is responsible to ensure to not rely on any auto-coercion of strings to paths.

TYPE: bool DEFAULT: True

Source code in struct2tensor/expression_impl/promote.py
def __init__(self, origin: expression.Expression,
             origin_parent: expression.Expression):

  super().__init__(
      origin.is_repeated or origin_parent.is_repeated,
      origin.type,
      schema_feature=_get_promote_schema_feature(
          origin.schema_feature, origin_parent.schema_feature
      ),
      validate_step_format=origin.validate_step_format,
  )
  self._origin = origin
  self._origin_parent = origin_parent
  if self._origin_parent.type is not None:
    raise ValueError("origin_parent cannot be a field")
Attributes
is_leaf property
is_leaf: bool

True iff the node tensor is a LeafNodeTensor.

is_repeated property
is_repeated: bool

True iff the same parent value can have multiple children values.

schema_feature property
schema_feature: Optional[Feature]

Return the schema of the field.

type property
type: Optional[DType]

dtype of the expression, or None if not a leaf expression.

validate_step_format property
validate_step_format: bool
Functions
apply
apply(
    transform: Callable[[Expression], Expression]
) -> Expression
Source code in struct2tensor/expression.py
def apply(self,
          transform: Callable[["Expression"], "Expression"]) -> "Expression":
  return transform(self)
apply_schema
apply_schema(schema: Schema) -> Expression
Source code in struct2tensor/expression.py
def apply_schema(self, schema: schema_pb2.Schema) -> "Expression":
  return apply_schema.apply_schema(self, schema)
broadcast
broadcast(
    source_path: CoercableToPath,
    sibling_field: Step,
    new_field_name: Step,
) -> Expression

Broadcasts the existing field at source_path to the sibling_field.

Source code in struct2tensor/expression.py
def broadcast(self, source_path: CoercableToPath, sibling_field: path.Step,
              new_field_name: path.Step) -> "Expression":
  """Broadcasts the existing field at source_path to the sibling_field."""
  return broadcast.broadcast(self, path.create_path(source_path),
                             sibling_field, new_field_name)
calculate
calculate(
    sources: Sequence[NodeTensor],
    destinations: Sequence[Expression],
    options: Options,
    side_info: Optional[Prensor] = None,
) -> NodeTensor

Calculates the node tensor of the expression.

The node tensor must be a function of the properties of the expression and the node tensors of the expressions from get_source_expressions().

If is_leaf, then calculate must return a LeafNodeTensor. Otherwise, it must return a ChildNodeTensor or RootNodeTensor.

If calculate_is_identity is true, then this must return source_tensors[0].

Sometimes, for operations such as parsing the proto, calculate will return additional information. For example, calculate() for the root of the proto expression also parses out the tensors required to calculate the tensors of the children. This is why destinations are required.

For a reference use, see calculate_value_slowly(...) below.

PARAMETER DESCRIPTION
source_tensors

The node tensors of the expressions in get_source_expressions().

TYPE: Sequence[NodeTensor]

destinations

The expressions that will use the output of this method.

TYPE: Sequence[Expression]

options

Options for the calculation.

TYPE: Options

side_info

An optional prensor that is used to bind to a placeholder expression.

TYPE: Optional[Prensor] DEFAULT: None

RETURNS DESCRIPTION
NodeTensor

A NodeTensor representing the output of this expression.

Source code in struct2tensor/expression_impl/promote.py
def calculate(
    self,
    sources: Sequence[prensor.NodeTensor],
    destinations: Sequence[expression.Expression],
    options: calculate_options.Options,
    side_info: Optional[prensor.Prensor] = None) -> prensor.NodeTensor:
  [origin_value, origin_parent_value] = sources
  if not isinstance(origin_value, prensor.ChildNodeTensor):
    raise ValueError("origin_value must be a child")
  if not isinstance(origin_parent_value, prensor.ChildNodeTensor):
    raise ValueError("origin_parent_value must be a child node")
  new_parent_index = tf.gather(origin_parent_value.parent_index,
                               origin_value.parent_index)
  return prensor.ChildNodeTensor(new_parent_index, self.is_repeated)
calculation_equal
calculation_equal(expr: Expression) -> bool

self.calculate is equal to another expression.calculate.

Given the same source node tensors, self.calculate(...) and expression.calculate(...) will have the same result.

Note that this does not check that the source expressions of the two expressions are the same. Therefore, two operations can have the same calculation, but not the same output, because their sources are different. For example, if a.calculation_is_identity() is True and b.calculation_is_identity() is True, then a.calculation_equal(b) is True. However, unless a and b have the same source, the expressions themselves are not equal.

PARAMETER DESCRIPTION
expression

The expression to compare to.

TYPE: Expression

Source code in struct2tensor/expression_impl/promote.py
def calculation_equal(self, expr: expression.Expression) -> bool:
  return isinstance(expr, PromoteChildExpression)
calculation_is_identity
calculation_is_identity() -> bool

True iff the self.calculate is the identity.

There is exactly one source, and the output of self.calculate(...) is the node tensor of this source.

Source code in struct2tensor/expression_impl/promote.py
def calculation_is_identity(self) -> bool:
  return False
cogroup_by_index
cogroup_by_index(
    source_path: CoercableToPath,
    left_name: Step,
    right_name: Step,
    new_field_name: Step,
) -> Expression

Creates a cogroup of left_name and right_name at new_field_name.

Source code in struct2tensor/expression.py
def cogroup_by_index(self, source_path: CoercableToPath, left_name: path.Step,
                     right_name: path.Step,
                     new_field_name: path.Step) -> "Expression":
  """Creates a cogroup of left_name and right_name at new_field_name."""
  raise NotImplementedError("cogroup_by_index is not implemented")
create_has_field
create_has_field(
    source_path: CoercableToPath, new_field_name: Step
) -> Expression

Creates a field that is the presence of the source path.

Source code in struct2tensor/expression.py
def create_has_field(self, source_path: CoercableToPath,
                     new_field_name: path.Step) -> "Expression":
  """Creates a field that is the presence of the source path."""
  return size.has(self, path.create_path(source_path), new_field_name)
create_proto_index
create_proto_index(field_name: Step) -> Expression

Creates a proto index field as a direct child of the current root.

The proto index maps each root element to the original batch index. For example: [0, 2] means the first element came from the first proto in the original input tensor and the second element came from the third proto. The created field is always "dense" -- it has the same valency as the current root.

PARAMETER DESCRIPTION
field_name

the name of the field to be created.

TYPE: Step

RETURNS DESCRIPTION
Expression

An Expression object representing the result of the operation.

Source code in struct2tensor/expression.py
def create_proto_index(self, field_name: path.Step) -> "Expression":
  """Creates a proto index field as a direct child of the current root.

  The proto index maps each root element to the original batch index.
  For example: [0, 2] means the first element came from the first proto
  in the original input tensor and the second element came from the third
  proto. The created field is always "dense" -- it has the same valency as
  the current root.

  Args:
    field_name: the name of the field to be created.

  Returns:
    An Expression object representing the result of the operation.
  """

  return reroot.create_proto_index_field(self, field_name)
create_size_field
create_size_field(
    source_path: CoercableToPath, new_field_name: Step
) -> Expression

Creates a field that is the size of the source path.

Source code in struct2tensor/expression.py
def create_size_field(self, source_path: CoercableToPath,
                      new_field_name: path.Step) -> "Expression":
  """Creates a field that is the size of the source path."""
  return size.size(self, path.create_path(source_path), new_field_name)
get_child
get_child(field_name: Step) -> Optional[Expression]

Gets a named child.

Source code in struct2tensor/expression.py
def get_child(self, field_name: path.Step) -> Optional["Expression"]:
  """Gets a named child."""
  if field_name in self._child_cache:
    return self._child_cache[field_name]
  result = self._get_child_impl(field_name)
  self._child_cache[field_name] = result
  return result
get_child_or_error
get_child_or_error(field_name: Step) -> Expression

Gets a named child.

Source code in struct2tensor/expression.py
def get_child_or_error(self, field_name: path.Step) -> "Expression":
  """Gets a named child."""
  result = self.get_child(field_name)
  if result is None:
    raise KeyError("No such field: {}".format(field_name))
  return result
get_descendant
get_descendant(p: Path) -> Optional[Expression]

Finds the descendant at the path.

Source code in struct2tensor/expression.py
def get_descendant(self, p: path.Path) -> Optional["Expression"]:
  """Finds the descendant at the path."""
  result = self
  for field_name in p.field_list:
    result = result.get_child(field_name)
    if result is None:
      return None
  return result
get_descendant_or_error
get_descendant_or_error(p: Path) -> Expression

Finds the descendant at the path.

Source code in struct2tensor/expression.py
def get_descendant_or_error(self, p: path.Path) -> "Expression":
  """Finds the descendant at the path."""
  result = self.get_descendant(p)
  if result is None:
    raise ValueError("Missing path: {} in {}".format(
        str(p), self.schema_string(limit=20)))
  return result
get_known_children
get_known_children() -> Mapping[Step, Expression]
Source code in struct2tensor/expression.py
def get_known_children(self) -> Mapping[path.Step, "Expression"]:
  known_field_names = self.known_field_names()
  result = {}
  for name in known_field_names:
    result[name] = self.get_child_or_error(name)
  return result
get_known_descendants
get_known_descendants() -> Mapping[Path, Expression]

Gets a mapping from known paths to subexpressions.

The difference between this and get_descendants in Prensor is that all paths in a Prensor are realized, thus all known. But an Expression's descendants might not all be known at the point this method is called, because an expression may have an infinite number of children.

RETURNS DESCRIPTION
Mapping[Path, Expression]

A mapping from paths (relative to the root of the subexpression) to expressions.

Source code in struct2tensor/expression.py
def get_known_descendants(self) -> Mapping[path.Path, "Expression"]:
  # Rename get_known_descendants
  """Gets a mapping from known paths to subexpressions.

  The difference between this and get_descendants in Prensor is that
  all paths in a Prensor are realized, thus all known. But an Expression's
  descendants might not all be known at the point this method is called,
  because an expression may have an infinite number of children.

  Returns:
    A mapping from paths (relative to the root of the subexpression) to
      expressions.
  """
  known_subexpressions = {
      k: v.get_known_descendants()
      for k, v in self.get_known_children().items()
  }
  result = {}
  for field_name, subexpression in known_subexpressions.items():
    subexpression_path = path.Path(
        [field_name], validate_step_format=self.validate_step_format
    )
    for p, expr in subexpression.items():
      result[subexpression_path.concat(p)] = expr
  result[path.Path([], validate_step_format=self.validate_step_format)] = self
  return result
get_paths_with_schema
get_paths_with_schema() -> List[Path]

Extract only paths that contain schema information.

Source code in struct2tensor/expression.py
def get_paths_with_schema(self) -> List[path.Path]:
  """Extract only paths that contain schema information."""
  result = []
  for name, child in self.get_known_children().items():
    if child.schema_feature is None:
      continue
    result.extend(
        [
            path.Path(
                [name], validate_step_format=self.validate_step_format
            ).concat(x)
            for x in child.get_paths_with_schema()
        ]
    )
  # Note: We always take the root path and so will return an empty schema
  # if there is no schema information on any nodes, including the root.
  if not result:
    result.append(
        path.Path([], validate_step_format=self.validate_step_format)
    )
  return result
get_schema
get_schema(create_schema_features=True) -> Schema

Returns a schema for the entire tree.

PARAMETER DESCRIPTION
create_schema_features

If True, schema features are added for all children and a schema entry is created if not available on the child. If False, features are left off of the returned schema if there is no schema_feature on the child.

DEFAULT: True

Source code in struct2tensor/expression.py
def get_schema(self, create_schema_features=True) -> schema_pb2.Schema:
  """Returns a schema for the entire tree.

  Args:
    create_schema_features: If True, schema features are added for all
      children and a schema entry is created if not available on the child. If
      False, features are left off of the returned schema if there is no
      schema_feature on the child.
  """
  if not create_schema_features:
    return self.project(self.get_paths_with_schema()).get_schema()
  result = schema_pb2.Schema()
  self._populate_schema_feature_children(result.feature)
  return result
get_source_expressions
get_source_expressions() -> Sequence[Expression]

Gets the sources of this expression.

The node tensors of the source expressions must be sufficient to calculate the node tensor of this expression (see calculate and calculate_value_slowly).

RETURNS DESCRIPTION
Sequence[Expression]

The sources of this expression.

Source code in struct2tensor/expression_impl/promote.py
def get_source_expressions(self) -> Sequence[expression.Expression]:
  return [self._origin, self._origin_parent]
known_field_names
known_field_names() -> FrozenSet[Step]

Returns known field names of the expression.

TODO(martinz): implement set_field and project. Known field names of a parsed proto correspond to the fields declared in the message. Examples of "unknown" fields are extensions and explicit casts in an any field. The only way to know if an unknown field "(foo.bar)" is present in an expression expr is to call (expr["(foo.bar)"] is not None).

Notice that simply accessing a field does not make it "known". However, setting a field (or setting a descendant of a field) will make it known.

project(...) returns an expression where the known field names are the only field names. In general, if you want to depend upon known_field_names (e.g., if you want to compile a expression), then the best approach is to project() the expression first.

RETURNS DESCRIPTION
FrozenSet[Step]

An immutable set of field names.

Source code in struct2tensor/expression_impl/promote.py
def known_field_names(self) -> FrozenSet[path.Step]:
  return self._origin.known_field_names()
map_field_values
map_field_values(
    source_path: CoercableToPath,
    operator: Callable[[Tensor], Tensor],
    dtype: DType,
    new_field_name: Step,
) -> Expression

Map a primitive field to create a new primitive field.

Note

The dtype argument is added since the v1 API.

PARAMETER DESCRIPTION
source_path

the origin path.

TYPE: CoercableToPath

operator

an element-wise operator that takes a 1-dimensional vector.

TYPE: Callable[[Tensor], Tensor]

dtype

the type of the output.

TYPE: DType

new_field_name

the name of a new sibling of source_path.

TYPE: Step

RETURNS DESCRIPTION
Expression

the resulting root expression.

Source code in struct2tensor/expression.py
def map_field_values(self, source_path: CoercableToPath,
                     operator: Callable[[tf.Tensor], tf.Tensor],
                     dtype: tf.DType,
                     new_field_name: path.Step) -> "Expression":
  """Map a primitive field to create a new primitive field.

  !!! Note
      The dtype argument is added since the v1 API.

  Args:
    source_path: the origin path.
    operator: an element-wise operator that takes a 1-dimensional vector.
    dtype: the type of the output.
    new_field_name: the name of a new sibling of source_path.

  Returns:
    the resulting root expression.
  """
  return map_values.map_values(self, path.create_path(source_path), operator,
                               dtype, new_field_name)
map_ragged_tensors
map_ragged_tensors(
    parent_path: CoercableToPath,
    source_fields: Sequence[Step],
    operator: Callable[..., SparseTensor],
    is_repeated: bool,
    dtype: DType,
    new_field_name: Step,
) -> Expression

Maps a set of primitive fields of a message to a new field.

Unlike map_field_values, this operation allows you to some degree reshape the field. For instance, you can take two optional fields and create a repeated field, or perform a reduce_sum on the last dimension of a repeated field and create an optional field. The key constraint is that the operator must return a sparse tensor of the correct dimension: i.e., a 2D sparse tensor if is_repeated is true, or a 1D sparse tensor if is_repeated is false. Moreover, the first dimension of the sparse tensor must be equal to the first dimension of the input tensor.

PARAMETER DESCRIPTION
parent_path

the parent of the input and output fields.

TYPE: CoercableToPath

source_fields

the nonempty list of names of the source fields.

TYPE: Sequence[Step]

operator

an operator that takes len(source_fields) sparse tensors and returns a sparse tensor of the appropriate shape.

TYPE: Callable[..., SparseTensor]

is_repeated

whether the output is repeated.

TYPE: bool

dtype

the dtype of the result.

TYPE: DType

new_field_name

the name of the resulting field.

TYPE: Step

RETURNS DESCRIPTION
Expression

A new query.

Source code in struct2tensor/expression.py
def map_ragged_tensors(self, parent_path: CoercableToPath,
                       source_fields: Sequence[path.Step],
                       operator: Callable[..., tf.SparseTensor],
                       is_repeated: bool, dtype: tf.DType,
                       new_field_name: path.Step) -> "Expression":
  """Maps a set of primitive fields of a message to a new field.

  Unlike map_field_values, this operation allows you to some degree reshape
  the field. For instance, you can take two optional fields and create a
  repeated field, or perform a reduce_sum on the last dimension of a repeated
  field and create an optional field. The key constraint is that the operator
  must return a sparse tensor of the correct dimension: i.e., a
  2D sparse tensor if is_repeated is true, or a 1D sparse tensor if
  is_repeated is false. Moreover, the first dimension of the sparse tensor
  must be equal to the first dimension of the input tensor.

  Args:
    parent_path: the parent of the input and output fields.
    source_fields: the nonempty list of names of the source fields.
    operator: an operator that takes len(source_fields) sparse tensors and
      returns a sparse tensor of the appropriate shape.
    is_repeated: whether the output is repeated.
    dtype: the dtype of the result.
    new_field_name: the name of the resulting field.

  Returns:
    A new query.
  """
  return map_prensor.map_ragged_tensor(
      self,
      path.create_path(parent_path),
      [
          path.Path([f], validate_step_format=self.validate_step_format)
          for f in source_fields
      ],
      operator,
      is_repeated,
      dtype,
      new_field_name,
  )
map_sparse_tensors
map_sparse_tensors(
    parent_path: CoercableToPath,
    source_fields: Sequence[Step],
    operator: Callable[..., SparseTensor],
    is_repeated: bool,
    dtype: DType,
    new_field_name: Step,
) -> Expression

Maps a set of primitive fields of a message to a new field.

Unlike map_field_values, this operation allows you to some degree reshape the field. For instance, you can take two optional fields and create a repeated field, or perform a reduce_sum on the last dimension of a repeated field and create an optional field. The key constraint is that the operator must return a sparse tensor of the correct dimension: i.e., a 2D sparse tensor if is_repeated is true, or a 1D sparse tensor if is_repeated is false. Moreover, the first dimension of the sparse tensor must be equal to the first dimension of the input tensor.

PARAMETER DESCRIPTION
parent_path

the parent of the input and output fields.

TYPE: CoercableToPath

source_fields

the nonempty list of names of the source fields.

TYPE: Sequence[Step]

operator

an operator that takes len(source_fields) sparse tensors and returns a sparse tensor of the appropriate shape.

TYPE: Callable[..., SparseTensor]

is_repeated

whether the output is repeated.

TYPE: bool

dtype

the dtype of the result.

TYPE: DType

new_field_name

the name of the resulting field.

TYPE: Step

RETURNS DESCRIPTION
Expression

A new query.

Source code in struct2tensor/expression.py
def map_sparse_tensors(self, parent_path: CoercableToPath,
                       source_fields: Sequence[path.Step],
                       operator: Callable[..., tf.SparseTensor],
                       is_repeated: bool, dtype: tf.DType,
                       new_field_name: path.Step) -> "Expression":
  """Maps a set of primitive fields of a message to a new field.

  Unlike map_field_values, this operation allows you to some degree reshape
  the field. For instance, you can take two optional fields and create a
  repeated field, or perform a reduce_sum on the last dimension of a repeated
  field and create an optional field. The key constraint is that the operator
  must return a sparse tensor of the correct dimension: i.e., a
  2D sparse tensor if is_repeated is true, or a 1D sparse tensor if
  is_repeated is false. Moreover, the first dimension of the sparse tensor
  must be equal to the first dimension of the input tensor.

  Args:
    parent_path: the parent of the input and output fields.
    source_fields: the nonempty list of names of the source fields.
    operator: an operator that takes len(source_fields) sparse tensors and
      returns a sparse tensor of the appropriate shape.
    is_repeated: whether the output is repeated.
    dtype: the dtype of the result.
    new_field_name: the name of the resulting field.

  Returns:
    A new query.
  """
  return map_prensor.map_sparse_tensor(
      self,
      path.create_path(parent_path),
      [
          path.Path([f], validate_step_format=self.validate_step_format)
          for f in source_fields
      ],
      operator,
      is_repeated,
      dtype,
      new_field_name,
  )
project
project(path_list: Sequence[CoercableToPath]) -> Expression

Constrains the paths to those listed.

Source code in struct2tensor/expression.py
def project(self, path_list: Sequence[CoercableToPath]) -> "Expression":
  """Constrains the paths to those listed."""
  return project.project(self, [path.create_path(x) for x in path_list])
promote
promote(source_path: CoercableToPath, new_field_name: Step)

Promotes source_path to be a field new_field_name in its grandparent.

Source code in struct2tensor/expression.py
def promote(self, source_path: CoercableToPath, new_field_name: path.Step):
  """Promotes source_path to be a field new_field_name in its grandparent."""
  return promote.promote(self, path.create_path(source_path), new_field_name)
promote_and_broadcast
promote_and_broadcast(
    path_dictionary: Mapping[Step, CoercableToPath],
    dest_path_parent: CoercableToPath,
) -> Expression
Source code in struct2tensor/expression.py
def promote_and_broadcast(
    self, path_dictionary: Mapping[path.Step, CoercableToPath],
    dest_path_parent: CoercableToPath) -> "Expression":
  return promote_and_broadcast.promote_and_broadcast(
      self, {k: path.create_path(v) for k, v in path_dictionary.items()},
      path.create_path(dest_path_parent))
reroot
reroot(new_root: CoercableToPath) -> Expression

Returns a new list of protocol buffers available at new_root.

Source code in struct2tensor/expression.py
def reroot(self, new_root: CoercableToPath) -> "Expression":
  """Returns a new list of protocol buffers available at new_root."""
  return reroot.reroot(self, path.create_path(new_root))
schema_string
schema_string(limit: Optional[int] = None) -> str

Returns a schema for the expression.

For examle,

repeated root:
  optional int32 foo
  optional bar:
    optional string baz
  optional int64 bak

Note that unknown fields and subexpressions are not displayed.

PARAMETER DESCRIPTION
limit

if present, limit the recursion.

TYPE: Optional[int] DEFAULT: None

RETURNS DESCRIPTION
str

A string, describing (a part of) the schema.

Source code in struct2tensor/expression.py
def schema_string(self, limit: Optional[int] = None) -> str:
  """Returns a schema for the expression.

  For examle,
  ```
  repeated root:
    optional int32 foo
    optional bar:
      optional string baz
    optional int64 bak
  ```

  Note that unknown fields and subexpressions are not displayed.

  Args:
    limit: if present, limit the recursion.

  Returns:
    A string, describing (a part of) the schema.
  """
  return "\n".join(self._schema_string_helper("root", limit))
slice
slice(
    source_path: CoercableToPath,
    new_field_name: Step,
    begin: Optional[IndexValue] = None,
    end: Optional[IndexValue] = None,
) -> Expression

Creates a slice copy of source_path at new_field_path.

Note that if begin or end is negative, it is considered relative to the size of the array. e.g., slice(...,begin=-1) will get the last element of every array.

PARAMETER DESCRIPTION
source_path

the source of the slice.

TYPE: CoercableToPath

new_field_name

the new field that is generated.

TYPE: Step

begin

the beginning of the slice (inclusive).

TYPE: Optional[IndexValue] DEFAULT: None

end

the end of the slice (exclusive).

TYPE: Optional[IndexValue] DEFAULT: None

RETURNS DESCRIPTION
Expression

An Expression object representing the result of the operation.

Source code in struct2tensor/expression.py
def slice(self,
          source_path: CoercableToPath,
          new_field_name: path.Step,
          begin: Optional[IndexValue] = None,
          end: Optional[IndexValue] = None) -> "Expression":
  """Creates a slice copy of source_path at new_field_path.

  Note that if begin or end is negative, it is considered relative to
  the size of the array. e.g., slice(...,begin=-1) will get the last
  element of every array.

  Args:
    source_path: the source of the slice.
    new_field_name: the new field that is generated.
    begin: the beginning of the slice (inclusive).
    end: the end of the slice (exclusive).

  Returns:
    An Expression object representing the result of the operation.
  """
  return slice_expression.slice_expression(self,
                                           path.create_path(source_path),
                                           new_field_name, begin, end)
truncate
truncate(
    source_path: CoercableToPath,
    limit: Union[int, Tensor],
    new_field_name: Step,
) -> Expression

Creates a truncated copy of source_path at new_field_path.

Source code in struct2tensor/expression.py
def truncate(self, source_path: CoercableToPath, limit: Union[int, tf.Tensor],
             new_field_name: path.Step) -> "Expression":
  """Creates a truncated copy of source_path at new_field_path."""
  return self.slice(source_path, new_field_name, end=limit)
PromoteExpression
PromoteExpression(
    origin: Expression, origin_parent: Expression
)

Bases: Leaf

A promoted leaf.

Initialize a Leaf.

Note that a leaf must have a specified type.

PARAMETER DESCRIPTION
is_repeated

if the expression is repeated.

TYPE: bool

my_type

the DType of the field.

TYPE: DType

schema_feature

schema information about the field.

TYPE: Optional[Feature] DEFAULT: None

Source code in struct2tensor/expression_impl/promote.py
def __init__(self, origin: expression.Expression,
             origin_parent: expression.Expression):

  super().__init__(
      origin.is_repeated or origin_parent.is_repeated,
      origin.type,
      schema_feature=_get_promote_schema_feature(
          origin.schema_feature, origin_parent.schema_feature))
  self._origin = origin
  self._origin_parent = origin_parent
  if self.type is None:
    raise ValueError("Can only promote a field")
  if self._origin_parent.type is not None:
    raise ValueError("origin_parent cannot be a field")
Attributes
is_leaf property
is_leaf: bool

True iff the node tensor is a LeafNodeTensor.

is_repeated property
is_repeated: bool

True iff the same parent value can have multiple children values.

schema_feature property
schema_feature: Optional[Feature]

Return the schema of the field.

type property
type: Optional[DType]

dtype of the expression, or None if not a leaf expression.

validate_step_format property
validate_step_format: bool
Functions
apply
apply(
    transform: Callable[[Expression], Expression]
) -> Expression
Source code in struct2tensor/expression.py
def apply(self,
          transform: Callable[["Expression"], "Expression"]) -> "Expression":
  return transform(self)
apply_schema
apply_schema(schema: Schema) -> Expression
Source code in struct2tensor/expression.py
def apply_schema(self, schema: schema_pb2.Schema) -> "Expression":
  return apply_schema.apply_schema(self, schema)
broadcast
broadcast(
    source_path: CoercableToPath,
    sibling_field: Step,
    new_field_name: Step,
) -> Expression

Broadcasts the existing field at source_path to the sibling_field.

Source code in struct2tensor/expression.py
def broadcast(self, source_path: CoercableToPath, sibling_field: path.Step,
              new_field_name: path.Step) -> "Expression":
  """Broadcasts the existing field at source_path to the sibling_field."""
  return broadcast.broadcast(self, path.create_path(source_path),
                             sibling_field, new_field_name)
calculate
calculate(
    sources: Sequence[NodeTensor],
    destinations: Sequence[Expression],
    options: Options,
    side_info: Optional[Prensor] = None,
) -> NodeTensor

Calculates the node tensor of the expression.

The node tensor must be a function of the properties of the expression and the node tensors of the expressions from get_source_expressions().

If is_leaf, then calculate must return a LeafNodeTensor. Otherwise, it must return a ChildNodeTensor or RootNodeTensor.

If calculate_is_identity is true, then this must return source_tensors[0].

Sometimes, for operations such as parsing the proto, calculate will return additional information. For example, calculate() for the root of the proto expression also parses out the tensors required to calculate the tensors of the children. This is why destinations are required.

For a reference use, see calculate_value_slowly(...) below.

PARAMETER DESCRIPTION
source_tensors

The node tensors of the expressions in get_source_expressions().

TYPE: Sequence[NodeTensor]

destinations

The expressions that will use the output of this method.

TYPE: Sequence[Expression]

options

Options for the calculation.

TYPE: Options

side_info

An optional prensor that is used to bind to a placeholder expression.

TYPE: Optional[Prensor] DEFAULT: None

RETURNS DESCRIPTION
NodeTensor

A NodeTensor representing the output of this expression.

Source code in struct2tensor/expression_impl/promote.py
def calculate(
    self,
    sources: Sequence[prensor.NodeTensor],
    destinations: Sequence[expression.Expression],
    options: calculate_options.Options,
    side_info: Optional[prensor.Prensor] = None) -> prensor.NodeTensor:
  [origin_value, origin_parent_value] = sources
  if not isinstance(origin_value, prensor.LeafNodeTensor):
    raise ValueError("origin_value must be a leaf")
  if not isinstance(origin_parent_value, prensor.ChildNodeTensor):
    raise ValueError("origin_parent_value must be a child node")
  new_parent_index = tf.gather(origin_parent_value.parent_index,
                               origin_value.parent_index)
  return prensor.LeafNodeTensor(new_parent_index, origin_value.values,
                                self.is_repeated)
calculation_equal
calculation_equal(expr: Expression) -> bool

self.calculate is equal to another expression.calculate.

Given the same source node tensors, self.calculate(...) and expression.calculate(...) will have the same result.

Note that this does not check that the source expressions of the two expressions are the same. Therefore, two operations can have the same calculation, but not the same output, because their sources are different. For example, if a.calculation_is_identity() is True and b.calculation_is_identity() is True, then a.calculation_equal(b) is True. However, unless a and b have the same source, the expressions themselves are not equal.

PARAMETER DESCRIPTION
expression

The expression to compare to.

TYPE: Expression

Source code in struct2tensor/expression_impl/promote.py
def calculation_equal(self, expr: expression.Expression) -> bool:
  return isinstance(expr, PromoteExpression)
calculation_is_identity
calculation_is_identity() -> bool

True iff the self.calculate is the identity.

There is exactly one source, and the output of self.calculate(...) is the node tensor of this source.

Source code in struct2tensor/expression_impl/promote.py
def calculation_is_identity(self) -> bool:
  return False
cogroup_by_index
cogroup_by_index(
    source_path: CoercableToPath,
    left_name: Step,
    right_name: Step,
    new_field_name: Step,
) -> Expression

Creates a cogroup of left_name and right_name at new_field_name.

Source code in struct2tensor/expression.py
def cogroup_by_index(self, source_path: CoercableToPath, left_name: path.Step,
                     right_name: path.Step,
                     new_field_name: path.Step) -> "Expression":
  """Creates a cogroup of left_name and right_name at new_field_name."""
  raise NotImplementedError("cogroup_by_index is not implemented")
create_has_field
create_has_field(
    source_path: CoercableToPath, new_field_name: Step
) -> Expression

Creates a field that is the presence of the source path.

Source code in struct2tensor/expression.py
def create_has_field(self, source_path: CoercableToPath,
                     new_field_name: path.Step) -> "Expression":
  """Creates a field that is the presence of the source path."""
  return size.has(self, path.create_path(source_path), new_field_name)
create_proto_index
create_proto_index(field_name: Step) -> Expression

Creates a proto index field as a direct child of the current root.

The proto index maps each root element to the original batch index. For example: [0, 2] means the first element came from the first proto in the original input tensor and the second element came from the third proto. The created field is always "dense" -- it has the same valency as the current root.

PARAMETER DESCRIPTION
field_name

the name of the field to be created.

TYPE: Step

RETURNS DESCRIPTION
Expression

An Expression object representing the result of the operation.

Source code in struct2tensor/expression.py
def create_proto_index(self, field_name: path.Step) -> "Expression":
  """Creates a proto index field as a direct child of the current root.

  The proto index maps each root element to the original batch index.
  For example: [0, 2] means the first element came from the first proto
  in the original input tensor and the second element came from the third
  proto. The created field is always "dense" -- it has the same valency as
  the current root.

  Args:
    field_name: the name of the field to be created.

  Returns:
    An Expression object representing the result of the operation.
  """

  return reroot.create_proto_index_field(self, field_name)
create_size_field
create_size_field(
    source_path: CoercableToPath, new_field_name: Step
) -> Expression

Creates a field that is the size of the source path.

Source code in struct2tensor/expression.py
def create_size_field(self, source_path: CoercableToPath,
                      new_field_name: path.Step) -> "Expression":
  """Creates a field that is the size of the source path."""
  return size.size(self, path.create_path(source_path), new_field_name)
get_child
get_child(field_name: Step) -> Optional[Expression]

Gets a named child.

Source code in struct2tensor/expression.py
def get_child(self, field_name: path.Step) -> Optional["Expression"]:
  """Gets a named child."""
  if field_name in self._child_cache:
    return self._child_cache[field_name]
  result = self._get_child_impl(field_name)
  self._child_cache[field_name] = result
  return result
get_child_or_error
get_child_or_error(field_name: Step) -> Expression

Gets a named child.

Source code in struct2tensor/expression.py
def get_child_or_error(self, field_name: path.Step) -> "Expression":
  """Gets a named child."""
  result = self.get_child(field_name)
  if result is None:
    raise KeyError("No such field: {}".format(field_name))
  return result
get_descendant
get_descendant(p: Path) -> Optional[Expression]

Finds the descendant at the path.

Source code in struct2tensor/expression.py
def get_descendant(self, p: path.Path) -> Optional["Expression"]:
  """Finds the descendant at the path."""
  result = self
  for field_name in p.field_list:
    result = result.get_child(field_name)
    if result is None:
      return None
  return result
get_descendant_or_error
get_descendant_or_error(p: Path) -> Expression

Finds the descendant at the path.

Source code in struct2tensor/expression.py
def get_descendant_or_error(self, p: path.Path) -> "Expression":
  """Finds the descendant at the path."""
  result = self.get_descendant(p)
  if result is None:
    raise ValueError("Missing path: {} in {}".format(
        str(p), self.schema_string(limit=20)))
  return result
get_known_children
get_known_children() -> Mapping[Step, Expression]
Source code in struct2tensor/expression.py
def get_known_children(self) -> Mapping[path.Step, "Expression"]:
  known_field_names = self.known_field_names()
  result = {}
  for name in known_field_names:
    result[name] = self.get_child_or_error(name)
  return result
get_known_descendants
get_known_descendants() -> Mapping[Path, Expression]

Gets a mapping from known paths to subexpressions.

The difference between this and get_descendants in Prensor is that all paths in a Prensor are realized, thus all known. But an Expression's descendants might not all be known at the point this method is called, because an expression may have an infinite number of children.

RETURNS DESCRIPTION
Mapping[Path, Expression]

A mapping from paths (relative to the root of the subexpression) to expressions.

Source code in struct2tensor/expression.py
def get_known_descendants(self) -> Mapping[path.Path, "Expression"]:
  # Rename get_known_descendants
  """Gets a mapping from known paths to subexpressions.

  The difference between this and get_descendants in Prensor is that
  all paths in a Prensor are realized, thus all known. But an Expression's
  descendants might not all be known at the point this method is called,
  because an expression may have an infinite number of children.

  Returns:
    A mapping from paths (relative to the root of the subexpression) to
      expressions.
  """
  known_subexpressions = {
      k: v.get_known_descendants()
      for k, v in self.get_known_children().items()
  }
  result = {}
  for field_name, subexpression in known_subexpressions.items():
    subexpression_path = path.Path(
        [field_name], validate_step_format=self.validate_step_format
    )
    for p, expr in subexpression.items():
      result[subexpression_path.concat(p)] = expr
  result[path.Path([], validate_step_format=self.validate_step_format)] = self
  return result
get_paths_with_schema
get_paths_with_schema() -> List[Path]

Extract only paths that contain schema information.

Source code in struct2tensor/expression.py
def get_paths_with_schema(self) -> List[path.Path]:
  """Extract only paths that contain schema information."""
  result = []
  for name, child in self.get_known_children().items():
    if child.schema_feature is None:
      continue
    result.extend(
        [
            path.Path(
                [name], validate_step_format=self.validate_step_format
            ).concat(x)
            for x in child.get_paths_with_schema()
        ]
    )
  # Note: We always take the root path and so will return an empty schema
  # if there is no schema information on any nodes, including the root.
  if not result:
    result.append(
        path.Path([], validate_step_format=self.validate_step_format)
    )
  return result
get_schema
get_schema(create_schema_features=True) -> Schema

Returns a schema for the entire tree.

PARAMETER DESCRIPTION
create_schema_features

If True, schema features are added for all children and a schema entry is created if not available on the child. If False, features are left off of the returned schema if there is no schema_feature on the child.

DEFAULT: True

Source code in struct2tensor/expression.py
def get_schema(self, create_schema_features=True) -> schema_pb2.Schema:
  """Returns a schema for the entire tree.

  Args:
    create_schema_features: If True, schema features are added for all
      children and a schema entry is created if not available on the child. If
      False, features are left off of the returned schema if there is no
      schema_feature on the child.
  """
  if not create_schema_features:
    return self.project(self.get_paths_with_schema()).get_schema()
  result = schema_pb2.Schema()
  self._populate_schema_feature_children(result.feature)
  return result
get_source_expressions
get_source_expressions() -> Sequence[Expression]

Gets the sources of this expression.

The node tensors of the source expressions must be sufficient to calculate the node tensor of this expression (see calculate and calculate_value_slowly).

RETURNS DESCRIPTION
Sequence[Expression]

The sources of this expression.

Source code in struct2tensor/expression_impl/promote.py
def get_source_expressions(self) -> Sequence[expression.Expression]:
  return [self._origin, self._origin_parent]
known_field_names
known_field_names() -> FrozenSet[Step]

Returns known field names of the expression.

TODO(martinz): implement set_field and project. Known field names of a parsed proto correspond to the fields declared in the message. Examples of "unknown" fields are extensions and explicit casts in an any field. The only way to know if an unknown field "(foo.bar)" is present in an expression expr is to call (expr["(foo.bar)"] is not None).

Notice that simply accessing a field does not make it "known". However, setting a field (or setting a descendant of a field) will make it known.

project(...) returns an expression where the known field names are the only field names. In general, if you want to depend upon known_field_names (e.g., if you want to compile a expression), then the best approach is to project() the expression first.

RETURNS DESCRIPTION
FrozenSet[Step]

An immutable set of field names.

Source code in struct2tensor/expression.py
def known_field_names(self) -> FrozenSet[path.Step]:
  return frozenset()
map_field_values
map_field_values(
    source_path: CoercableToPath,
    operator: Callable[[Tensor], Tensor],
    dtype: DType,
    new_field_name: Step,
) -> Expression

Map a primitive field to create a new primitive field.

Note

The dtype argument is added since the v1 API.

PARAMETER DESCRIPTION
source_path

the origin path.

TYPE: CoercableToPath

operator

an element-wise operator that takes a 1-dimensional vector.

TYPE: Callable[[Tensor], Tensor]

dtype

the type of the output.

TYPE: DType

new_field_name

the name of a new sibling of source_path.

TYPE: Step

RETURNS DESCRIPTION
Expression

the resulting root expression.

Source code in struct2tensor/expression.py
def map_field_values(self, source_path: CoercableToPath,
                     operator: Callable[[tf.Tensor], tf.Tensor],
                     dtype: tf.DType,
                     new_field_name: path.Step) -> "Expression":
  """Map a primitive field to create a new primitive field.

  !!! Note
      The dtype argument is added since the v1 API.

  Args:
    source_path: the origin path.
    operator: an element-wise operator that takes a 1-dimensional vector.
    dtype: the type of the output.
    new_field_name: the name of a new sibling of source_path.

  Returns:
    the resulting root expression.
  """
  return map_values.map_values(self, path.create_path(source_path), operator,
                               dtype, new_field_name)
map_ragged_tensors
map_ragged_tensors(
    parent_path: CoercableToPath,
    source_fields: Sequence[Step],
    operator: Callable[..., SparseTensor],
    is_repeated: bool,
    dtype: DType,
    new_field_name: Step,
) -> Expression

Maps a set of primitive fields of a message to a new field.

Unlike map_field_values, this operation allows you to some degree reshape the field. For instance, you can take two optional fields and create a repeated field, or perform a reduce_sum on the last dimension of a repeated field and create an optional field. The key constraint is that the operator must return a sparse tensor of the correct dimension: i.e., a 2D sparse tensor if is_repeated is true, or a 1D sparse tensor if is_repeated is false. Moreover, the first dimension of the sparse tensor must be equal to the first dimension of the input tensor.

PARAMETER DESCRIPTION
parent_path

the parent of the input and output fields.

TYPE: CoercableToPath

source_fields

the nonempty list of names of the source fields.

TYPE: Sequence[Step]

operator

an operator that takes len(source_fields) sparse tensors and returns a sparse tensor of the appropriate shape.

TYPE: Callable[..., SparseTensor]

is_repeated

whether the output is repeated.

TYPE: bool

dtype

the dtype of the result.

TYPE: DType

new_field_name

the name of the resulting field.

TYPE: Step

RETURNS DESCRIPTION
Expression

A new query.

Source code in struct2tensor/expression.py
def map_ragged_tensors(self, parent_path: CoercableToPath,
                       source_fields: Sequence[path.Step],
                       operator: Callable[..., tf.SparseTensor],
                       is_repeated: bool, dtype: tf.DType,
                       new_field_name: path.Step) -> "Expression":
  """Maps a set of primitive fields of a message to a new field.

  Unlike map_field_values, this operation allows you to some degree reshape
  the field. For instance, you can take two optional fields and create a
  repeated field, or perform a reduce_sum on the last dimension of a repeated
  field and create an optional field. The key constraint is that the operator
  must return a sparse tensor of the correct dimension: i.e., a
  2D sparse tensor if is_repeated is true, or a 1D sparse tensor if
  is_repeated is false. Moreover, the first dimension of the sparse tensor
  must be equal to the first dimension of the input tensor.

  Args:
    parent_path: the parent of the input and output fields.
    source_fields: the nonempty list of names of the source fields.
    operator: an operator that takes len(source_fields) sparse tensors and
      returns a sparse tensor of the appropriate shape.
    is_repeated: whether the output is repeated.
    dtype: the dtype of the result.
    new_field_name: the name of the resulting field.

  Returns:
    A new query.
  """
  return map_prensor.map_ragged_tensor(
      self,
      path.create_path(parent_path),
      [
          path.Path([f], validate_step_format=self.validate_step_format)
          for f in source_fields
      ],
      operator,
      is_repeated,
      dtype,
      new_field_name,
  )
map_sparse_tensors
map_sparse_tensors(
    parent_path: CoercableToPath,
    source_fields: Sequence[Step],
    operator: Callable[..., SparseTensor],
    is_repeated: bool,
    dtype: DType,
    new_field_name: Step,
) -> Expression

Maps a set of primitive fields of a message to a new field.

Unlike map_field_values, this operation allows you to some degree reshape the field. For instance, you can take two optional fields and create a repeated field, or perform a reduce_sum on the last dimension of a repeated field and create an optional field. The key constraint is that the operator must return a sparse tensor of the correct dimension: i.e., a 2D sparse tensor if is_repeated is true, or a 1D sparse tensor if is_repeated is false. Moreover, the first dimension of the sparse tensor must be equal to the first dimension of the input tensor.

PARAMETER DESCRIPTION
parent_path

the parent of the input and output fields.

TYPE: CoercableToPath

source_fields

the nonempty list of names of the source fields.

TYPE: Sequence[Step]

operator

an operator that takes len(source_fields) sparse tensors and returns a sparse tensor of the appropriate shape.

TYPE: Callable[..., SparseTensor]

is_repeated

whether the output is repeated.

TYPE: bool

dtype

the dtype of the result.

TYPE: DType

new_field_name

the name of the resulting field.

TYPE: Step

RETURNS DESCRIPTION
Expression

A new query.

Source code in struct2tensor/expression.py
def map_sparse_tensors(self, parent_path: CoercableToPath,
                       source_fields: Sequence[path.Step],
                       operator: Callable[..., tf.SparseTensor],
                       is_repeated: bool, dtype: tf.DType,
                       new_field_name: path.Step) -> "Expression":
  """Maps a set of primitive fields of a message to a new field.

  Unlike map_field_values, this operation allows you to some degree reshape
  the field. For instance, you can take two optional fields and create a
  repeated field, or perform a reduce_sum on the last dimension of a repeated
  field and create an optional field. The key constraint is that the operator
  must return a sparse tensor of the correct dimension: i.e., a
  2D sparse tensor if is_repeated is true, or a 1D sparse tensor if
  is_repeated is false. Moreover, the first dimension of the sparse tensor
  must be equal to the first dimension of the input tensor.

  Args:
    parent_path: the parent of the input and output fields.
    source_fields: the nonempty list of names of the source fields.
    operator: an operator that takes len(source_fields) sparse tensors and
      returns a sparse tensor of the appropriate shape.
    is_repeated: whether the output is repeated.
    dtype: the dtype of the result.
    new_field_name: the name of the resulting field.

  Returns:
    A new query.
  """
  return map_prensor.map_sparse_tensor(
      self,
      path.create_path(parent_path),
      [
          path.Path([f], validate_step_format=self.validate_step_format)
          for f in source_fields
      ],
      operator,
      is_repeated,
      dtype,
      new_field_name,
  )
project
project(path_list: Sequence[CoercableToPath]) -> Expression

Constrains the paths to those listed.

Source code in struct2tensor/expression.py
def project(self, path_list: Sequence[CoercableToPath]) -> "Expression":
  """Constrains the paths to those listed."""
  return project.project(self, [path.create_path(x) for x in path_list])
promote
promote(source_path: CoercableToPath, new_field_name: Step)

Promotes source_path to be a field new_field_name in its grandparent.

Source code in struct2tensor/expression.py
def promote(self, source_path: CoercableToPath, new_field_name: path.Step):
  """Promotes source_path to be a field new_field_name in its grandparent."""
  return promote.promote(self, path.create_path(source_path), new_field_name)
promote_and_broadcast
promote_and_broadcast(
    path_dictionary: Mapping[Step, CoercableToPath],
    dest_path_parent: CoercableToPath,
) -> Expression
Source code in struct2tensor/expression.py
def promote_and_broadcast(
    self, path_dictionary: Mapping[path.Step, CoercableToPath],
    dest_path_parent: CoercableToPath) -> "Expression":
  return promote_and_broadcast.promote_and_broadcast(
      self, {k: path.create_path(v) for k, v in path_dictionary.items()},
      path.create_path(dest_path_parent))
reroot
reroot(new_root: CoercableToPath) -> Expression

Returns a new list of protocol buffers available at new_root.

Source code in struct2tensor/expression.py
def reroot(self, new_root: CoercableToPath) -> "Expression":
  """Returns a new list of protocol buffers available at new_root."""
  return reroot.reroot(self, path.create_path(new_root))
schema_string
schema_string(limit: Optional[int] = None) -> str

Returns a schema for the expression.

For examle,

repeated root:
  optional int32 foo
  optional bar:
    optional string baz
  optional int64 bak

Note that unknown fields and subexpressions are not displayed.

PARAMETER DESCRIPTION
limit

if present, limit the recursion.

TYPE: Optional[int] DEFAULT: None

RETURNS DESCRIPTION
str

A string, describing (a part of) the schema.

Source code in struct2tensor/expression.py
def schema_string(self, limit: Optional[int] = None) -> str:
  """Returns a schema for the expression.

  For examle,
  ```
  repeated root:
    optional int32 foo
    optional bar:
      optional string baz
    optional int64 bak
  ```

  Note that unknown fields and subexpressions are not displayed.

  Args:
    limit: if present, limit the recursion.

  Returns:
    A string, describing (a part of) the schema.
  """
  return "\n".join(self._schema_string_helper("root", limit))
slice
slice(
    source_path: CoercableToPath,
    new_field_name: Step,
    begin: Optional[IndexValue] = None,
    end: Optional[IndexValue] = None,
) -> Expression

Creates a slice copy of source_path at new_field_path.

Note that if begin or end is negative, it is considered relative to the size of the array. e.g., slice(...,begin=-1) will get the last element of every array.

PARAMETER DESCRIPTION
source_path

the source of the slice.

TYPE: CoercableToPath

new_field_name

the new field that is generated.

TYPE: Step

begin

the beginning of the slice (inclusive).

TYPE: Optional[IndexValue] DEFAULT: None

end

the end of the slice (exclusive).

TYPE: Optional[IndexValue] DEFAULT: None

RETURNS DESCRIPTION
Expression

An Expression object representing the result of the operation.

Source code in struct2tensor/expression.py
def slice(self,
          source_path: CoercableToPath,
          new_field_name: path.Step,
          begin: Optional[IndexValue] = None,
          end: Optional[IndexValue] = None) -> "Expression":
  """Creates a slice copy of source_path at new_field_path.

  Note that if begin or end is negative, it is considered relative to
  the size of the array. e.g., slice(...,begin=-1) will get the last
  element of every array.

  Args:
    source_path: the source of the slice.
    new_field_name: the new field that is generated.
    begin: the beginning of the slice (inclusive).
    end: the end of the slice (exclusive).

  Returns:
    An Expression object representing the result of the operation.
  """
  return slice_expression.slice_expression(self,
                                           path.create_path(source_path),
                                           new_field_name, begin, end)
truncate
truncate(
    source_path: CoercableToPath,
    limit: Union[int, Tensor],
    new_field_name: Step,
) -> Expression

Creates a truncated copy of source_path at new_field_path.

Source code in struct2tensor/expression.py
def truncate(self, source_path: CoercableToPath, limit: Union[int, tf.Tensor],
             new_field_name: path.Step) -> "Expression":
  """Creates a truncated copy of source_path at new_field_path."""
  return self.slice(source_path, new_field_name, end=limit)
Functions
promote
promote(
    root: Expression, p: Path, new_field_name: Step
) -> Expression

Promote a path to be a child of its grandparent, and give it a name.

Source code in struct2tensor/expression_impl/promote.py
def promote(root: expression.Expression, p: path.Path,
            new_field_name: path.Step) -> expression.Expression:
  """Promote a path to be a child of its grandparent, and give it a name."""
  return _promote_impl(root, p, new_field_name)[0]
promote_anonymous
promote_anonymous(
    root: Expression, p: Path
) -> Tuple[Expression, Path]

Promote a path to be a new anonymous child of its grandparent.

Source code in struct2tensor/expression_impl/promote.py
def promote_anonymous(root: expression.Expression,
                      p: path.Path) -> Tuple[expression.Expression, path.Path]:
  """Promote a path to be a new anonymous child of its grandparent."""
  return _promote_impl(root, p, path.get_anonymous_field())
Modules

promote_and_broadcast

promote_and_broadcast a set of nodes.

For example, suppose an expr represents:

+
|
+-session*   (stars indicate repeated)
     |
     +-event*
     |   |
     |   +-val*-int64
     |
     +-user_info? (question mark indicates optional)
           |
           +-age? int64
session: {
  event: {
    val: 1
  }
  event: {
    val: 4
    val: 5
  }
  user_info: {
    age: 25
  }
}

session: {
  event: {
    val: 7
  }
  event: {
    val: 8
    val: 9
  }
  user_info: {
    age: 20
  }
}
promote_and_broadcast.promote_and_broadcast(
    path.Path(["event"]),{"nage":path.Path(["user_info","age"])})

creates:

+
|
+-session*   (stars indicate repeated)
     |
     +-event*
     |   |
     |   +-val*-int64
     |   |
     |   +-nage*-int64
     |
     +-user_info? (question mark indicates optional)
           |
           +-age? int64
session: {
  event: {
    nage: 25
    val: 1
  }
  event: {
    nage: 25
    val: 4
    val: 5
  }
  user_info: {
    age: 25
  }
}

session: {
  event: {
    nage: 20
    val: 7
  }
  event: {
    nage: 20
    val: 8
    val: 9
  }
  user_info: {
    age: 20
  }
}
Functions
promote_and_broadcast
promote_and_broadcast(
    root: Expression,
    path_dictionary: Mapping[Step, Path],
    dest_path_parent: Path,
) -> Expression

Promote and broadcast a set of paths to a particular location.

PARAMETER DESCRIPTION
root

the original expression.

TYPE: Expression

path_dictionary

a map from destination fields to origin paths.

TYPE: Mapping[Step, Path]

dest_path_parent

a map from destination strings.

TYPE: Path

RETURNS DESCRIPTION
Expression

A new expression, where all the origin paths are promoted and broadcast until they are children of dest_path_parent.

Source code in struct2tensor/expression_impl/promote_and_broadcast.py
def promote_and_broadcast(root: expression.Expression,
                          path_dictionary: Mapping[path.Step, path.Path],
                          dest_path_parent: path.Path) -> expression.Expression:
  """Promote and broadcast a set of paths to a particular location.

  Args:
    root: the original expression.
    path_dictionary: a map from destination fields to origin paths.
    dest_path_parent: a map from destination strings.

  Returns:
    A new expression, where all the origin paths are promoted and broadcast
      until they are children of dest_path_parent.
  """

  result_paths = {}
  # Here, we branch out and create a different tree for each field that is
  # promoted and broadcast.
  for field_name, origin_path in path_dictionary.items():
    result_path = dest_path_parent.get_child(field_name)
    new_root = _promote_and_broadcast_name(root, origin_path, dest_path_parent,
                                           field_name)
    result_paths[result_path] = new_root
  # We create a new tree that has all of the generated fields from the older
  # trees.
  return expression_add.add_to(root, result_paths)
promote_and_broadcast_anonymous
promote_and_broadcast_anonymous(
    root: Expression, origin: Path, new_parent: Path
) -> Tuple[Expression, Path]

Promotes then broadcasts the origin until its parent is new_parent.

Source code in struct2tensor/expression_impl/promote_and_broadcast.py
def promote_and_broadcast_anonymous(
    root: expression.Expression, origin: path.Path,
    new_parent: path.Path) -> Tuple[expression.Expression, path.Path]:
  """Promotes then broadcasts the origin until its parent is new_parent."""
  least_common_ancestor = origin.get_least_common_ancestor(new_parent)

  new_expr, new_path = root, origin
  while new_path.get_parent() != least_common_ancestor:
    new_expr, new_path = promote.promote_anonymous(new_expr, new_path)

  while new_path.get_parent() != new_parent:
    new_parent_step = new_parent.field_list[len(new_path) - 1]
    new_expr, new_path = broadcast.broadcast_anonymous(new_expr, new_path,
                                                       new_parent_step)

  return new_expr, new_path
Modules

proto

Expressions to parse a proto.

These expressions return values with more information than standard node values. Specifically, each node calculates additional tensors that are used as inputs for its children.

Attributes
ProtoExpression module-attribute
ProtoExpression = Union[
    _ProtoRootExpression,
    _ProtoChildExpression,
    _ProtoLeafExpression,
]
ProtoFieldName module-attribute
ProtoFieldName = str
ProtoFullName module-attribute
ProtoFullName = str
StrStep module-attribute
StrStep = str
TransformFn module-attribute
TransformFn = Callable[
    [Tensor, Tensor], Tuple[Tensor, Tensor]
]
Functions
create_expression_from_file_descriptor_set
create_expression_from_file_descriptor_set(
    tensor_of_protos: Tensor,
    proto_name: ProtoFullName,
    file_descriptor_set: FileDescriptorSet,
    message_format: str = "binary",
) -> Expression

Create an expression from a 1D tensor of serialized protos.

PARAMETER DESCRIPTION
tensor_of_protos

1D tensor of serialized protos.

TYPE: Tensor

proto_name

fully qualified name (e.g. "some.package.SomeProto") of the proto in tensor_of_protos.

TYPE: ProtoFullName

file_descriptor_set

The FileDescriptorSet proto containing proto_name's and all its dependencies' FileDescriptorProto. Note that if file1 imports file2, then file2's FileDescriptorProto must precede file1's in file_descriptor_set.file.

TYPE: FileDescriptorSet

message_format

Indicates the format of the protocol buffer: is one of 'text' or 'binary'.

TYPE: str DEFAULT: 'binary'

RETURNS DESCRIPTION
Expression

An expression.

Source code in struct2tensor/expression_impl/proto.py
def create_expression_from_file_descriptor_set(
    tensor_of_protos: tf.Tensor,
    proto_name: ProtoFullName,
    file_descriptor_set: descriptor_pb2.FileDescriptorSet,
    message_format: str = "binary") -> expression.Expression:
  """Create an expression from a 1D tensor of serialized protos.

  Args:
    tensor_of_protos: 1D tensor of serialized protos.
    proto_name: fully qualified name (e.g. "some.package.SomeProto") of the
      proto in `tensor_of_protos`.
    file_descriptor_set: The FileDescriptorSet proto containing `proto_name`'s
      and all its dependencies' FileDescriptorProto. Note that if file1 imports
      file2, then file2's FileDescriptorProto must precede file1's in
      file_descriptor_set.file.
    message_format: Indicates the format of the protocol buffer: is one of
       'text' or 'binary'.

  Returns:
    An expression.
  """

  pool = DescriptorPool()
  for f in file_descriptor_set.file:
    # This method raises if f's dependencies have not been added.
    pool.Add(f)

  # This method raises if proto not found.
  desc = pool.FindMessageTypeByName(proto_name)

  return create_expression_from_proto(tensor_of_protos, desc, message_format)
create_expression_from_proto
create_expression_from_proto(
    tensor_of_protos: Tensor,
    desc: Descriptor,
    message_format: str = "binary",
) -> Expression

Create an expression from a 1D tensor of serialized protos.

PARAMETER DESCRIPTION
tensor_of_protos

1D tensor of serialized protos.

TYPE: Tensor

desc

a descriptor of protos in tensor of protos.

TYPE: Descriptor

message_format

Indicates the format of the protocol buffer: is one of 'text' or 'binary'.

TYPE: str DEFAULT: 'binary'

RETURNS DESCRIPTION
Expression

An expression.

Source code in struct2tensor/expression_impl/proto.py
def create_expression_from_proto(
    tensor_of_protos: tf.Tensor,
    desc: descriptor.Descriptor,
    message_format: str = "binary") -> expression.Expression:
  """Create an expression from a 1D tensor of serialized protos.

  Args:
    tensor_of_protos: 1D tensor of serialized protos.
    desc: a descriptor of protos in tensor of protos.
    message_format: Indicates the format of the protocol buffer: is one of
      'text' or 'binary'.

  Returns:
    An expression.
  """
  return _ProtoRootExpression(desc, tensor_of_protos, message_format)
create_transformed_field
create_transformed_field(
    expr: Expression,
    source_path: CoercableToPath,
    dest_field: StrStep,
    transform_fn: TransformFn,
) -> Expression

Create an expression that transforms serialized proto tensors.

The transform_fn argument should take the form:

def transform_fn(parent_indices, values): ... return (transformed_parent_indices, transformed_values)

Given:

  • parent_indices: an int64 vector of non-decreasing parent message indices.
  • values: a string vector of serialized protos having the same shape as parent_indices.

transform_fn must return new parent indices and serialized values encoding the same proto message as the passed in values. These two vectors must have the same size, but it need not be the same as the input arguments.

Note

If CalculateOptions.use_string_view (set at calculate time, thus this Expression cannot know beforehand) is True, values passed to transform_fn are string views pointing all the way back to the original input tensor (of serialized root protos). And transform_fn must maintain such views and avoid creating new values that are either not string views into the root protos or self-owned strings. This is because downstream decoding ops will still produce string views referring into its input (which are string views into the root proto) and they will only hold a reference to the original, root proto tensor, keeping it alive. So the input tensor may get destroyed after the decoding op.

In short, you can do element-wise transforms to values, but can't mutate the contents of elements in values or create new elements.

To lift this restriction, a decoding op must be told to hold a reference of the input tensors of all its upstream decoding ops.

PARAMETER DESCRIPTION
expr

a source expression containing source_path.

TYPE: Expression

source_path

the path to the field to reverse.

TYPE: CoercableToPath

dest_field

the name of the newly created field. This field will be a sibling of the field identified by source_path.

TYPE: StrStep

transform_fn

a callable that accepts parent_indices and serialized proto values and returns a posibly modified parent_indices and values. Note that when CalcuateOptions.use_string_view is set, transform_fn should not have any stateful side effecting uses of serialized proto inputs. Doing so could cause segfaults as the backing string tensor lifetime is not guaranteed when the side effecting operations are run.

TYPE: TransformFn

RETURNS DESCRIPTION
Expression

An expression.

RAISES DESCRIPTION
ValueError

if the source path is not a proto message field.

Source code in struct2tensor/expression_impl/proto.py
def create_transformed_field(
    expr: expression.Expression, source_path: path.CoercableToPath,
    dest_field: StrStep, transform_fn: TransformFn) -> expression.Expression:
  """Create an expression that transforms serialized proto tensors.

  The transform_fn argument should take the form:

  def transform_fn(parent_indices, values):
    ...
    return (transformed_parent_indices, transformed_values)

  Given:

  - parent_indices: an int64 vector of non-decreasing parent message indices.
  - values: a string vector of serialized protos having the same shape as
    `parent_indices`.

  `transform_fn` must return new parent indices and serialized values encoding
  the same proto message as the passed in `values`.  These two vectors must
  have the same size, but it need not be the same as the input arguments.

  !!! Note
      If CalculateOptions.use_string_view (set at calculate time, thus this
      Expression cannot know beforehand) is True, `values` passed to
      `transform_fn` are string views pointing all the way back to the original
      input tensor (of serialized root protos). And `transform_fn` must maintain
      such views and avoid creating new values that are either not string views
      into the root protos or self-owned strings. This is because downstream
      decoding ops will still produce string views referring into its input
      (which are string views into the root proto) and they will only hold a
      reference to the original, root proto tensor, keeping it alive. So the input
      tensor may get destroyed after the decoding op.

      In short, you can do element-wise transforms to `values`, but can't mutate
      the contents of elements in `values` or create new elements.

      To lift this restriction, a decoding op must be told to hold a reference
      of the input tensors of all its upstream decoding ops.


  Args:
    expr: a source expression containing `source_path`.
    source_path: the path to the field to reverse.
    dest_field: the name of the newly created field. This field will be a
      sibling of the field identified by `source_path`.
    transform_fn: a callable that accepts parent_indices and serialized proto
      values and returns a posibly modified parent_indices and values. Note that
      when CalcuateOptions.use_string_view is set, transform_fn should not have
      any stateful side effecting uses of serialized proto inputs. Doing so
      could cause segfaults as the backing string tensor lifetime is not
      guaranteed when the side effecting operations are run.

  Returns:
    An expression.

  Raises:
    ValueError: if the source path is not a proto message field.
  """
  source_path = path.create_path(source_path)
  source_expr = expr.get_descendant_or_error(source_path)
  if not isinstance(source_expr, _ProtoChildExpression):
    raise ValueError(
        "Expected _ProtoChildExpression for field {}, but found {}.".format(
            str(source_path), source_expr))

  if isinstance(source_expr, _TransformProtoChildExpression):
    # In order to be able to propagate fields needed for parsing, the source
    # expression of _TransformProtoChildExpression must always be the original
    # _ProtoChildExpression before any transformation. This means that two
    # sequentially applied _TransformProtoChildExpression would have the same
    # source and would apply the transformation to the source directly, instead
    # of one transform operating on the output of the other.
    # To work around this, the user supplied transform function is wrapped to
    # first call the source's transform function.
    # The downside of this approach is that the initial transform may be
    # applied redundantly if there are other expressions derived directly
    # from it.
    def final_transform(parent_indices: tf.Tensor,
                        values: tf.Tensor) -> Tuple[tf.Tensor, tf.Tensor]:
      parent_indices, values = source_expr.transform_fn(parent_indices, values)
      return transform_fn(parent_indices, values)
  else:
    final_transform = transform_fn

  transformed_expr = _TransformProtoChildExpression(
      parent=source_expr._parent,  # pylint: disable=protected-access
      desc=source_expr._desc,  # pylint: disable=protected-access
      is_repeated=source_expr.is_repeated,
      name_as_field=source_expr.name_as_field,
      transform_fn=final_transform,
      backing_str_tensor=source_expr._backing_str_tensor)  # pylint: disable=protected-access
  dest_path = source_path.get_parent().get_child(dest_field)
  return expression_add.add_paths(expr, {dest_path: transformed_expr})
is_proto_expression
is_proto_expression(expr: Expression) -> bool

Returns true if an expression is a ProtoExpression.

Source code in struct2tensor/expression_impl/proto.py
def is_proto_expression(expr: expression.Expression) -> bool:
  """Returns true if an expression is a ProtoExpression."""
  return isinstance(
      expr, (_ProtoRootExpression, _ProtoChildExpression, _ProtoLeafExpression))
Modules

reroot

Reroot to a subtree, maintaining an input proto index.

reroot is similar to get_descendant_or_error. However, this method allows you to call create_proto_index(...) later on, that gives you a reference to the original proto.

Functions
create_proto_index_field
create_proto_index_field(
    root: Expression, new_field_name: Step
) -> Expression
Source code in struct2tensor/expression_impl/reroot.py
def create_proto_index_field(root: expression.Expression,
                             new_field_name: path.Step
                            ) -> expression.Expression:
  return expression_add.add_paths(
      root, {path.Path([new_field_name]): _InputProtoIndexExpression(root)})
reroot
reroot(root: Expression, source_path: Path) -> Expression

Reroot to a new path, maintaining a input proto index.

Similar to root.get_descendant_or_error(source_path): however, this method retains the ability to get a map to the original index.

PARAMETER DESCRIPTION
root

the original root.

TYPE: Expression

source_path

the path to the new root.

TYPE: Path

RETURNS DESCRIPTION
Expression

the new root.

Source code in struct2tensor/expression_impl/reroot.py
def reroot(root: expression.Expression,
           source_path: path.Path) -> expression.Expression:
  """Reroot to a new path, maintaining a input proto index.

  Similar to root.get_descendant_or_error(source_path): however, this
  method retains the ability to get a map to the original index.

  Args:
    root: the original root.
    source_path: the path to the new root.

  Returns:
    the new root.
  """

  new_root = root
  for step in source_path.field_list:
    new_root = _RerootExpression(new_root, step)
  return new_root
Modules

size

Functions for creating new size or has expression.

Given a field "foo.bar",

root = size(expr, path.Path(["foo","bar"]), "bar_size")

creates a new expression root that has an optional field "foo.bar_size", which is always present, and contains the number of bar in a particular foo.

root_2 = has(expr, path.Path(["foo","bar"]), "bar_has")

creates a new expression root that has an optional field "foo.bar_has", which is always present, and is true if there are one or more bar in foo.

Classes
SizeExpression
SizeExpression(
    origin: Expression, origin_parent: Expression
)

Bases: Leaf

Size of the given expression.

SizeExpression is intended to be a sibling of origin. origin_parent should be the parent of origin.

Initialize a Leaf.

Note that a leaf must have a specified type.

PARAMETER DESCRIPTION
is_repeated

if the expression is repeated.

TYPE: bool

my_type

the DType of the field.

TYPE: DType

schema_feature

schema information about the field.

TYPE: Optional[Feature] DEFAULT: None

Source code in struct2tensor/expression_impl/size.py
def __init__(self, origin: expression.Expression,
             origin_parent: expression.Expression):
  super().__init__(False, tf.int64)
  self._origin = origin
  self._origin_parent = origin_parent
Attributes
is_leaf property
is_leaf: bool

True iff the node tensor is a LeafNodeTensor.

is_repeated property
is_repeated: bool

True iff the same parent value can have multiple children values.

schema_feature property
schema_feature: Optional[Feature]

Return the schema of the field.

type property
type: Optional[DType]

dtype of the expression, or None if not a leaf expression.

validate_step_format property
validate_step_format: bool
Functions
apply
apply(
    transform: Callable[[Expression], Expression]
) -> Expression
Source code in struct2tensor/expression.py
def apply(self,
          transform: Callable[["Expression"], "Expression"]) -> "Expression":
  return transform(self)
apply_schema
apply_schema(schema: Schema) -> Expression
Source code in struct2tensor/expression.py
def apply_schema(self, schema: schema_pb2.Schema) -> "Expression":
  return apply_schema.apply_schema(self, schema)
broadcast
broadcast(
    source_path: CoercableToPath,
    sibling_field: Step,
    new_field_name: Step,
) -> Expression

Broadcasts the existing field at source_path to the sibling_field.

Source code in struct2tensor/expression.py
def broadcast(self, source_path: CoercableToPath, sibling_field: path.Step,
              new_field_name: path.Step) -> "Expression":
  """Broadcasts the existing field at source_path to the sibling_field."""
  return broadcast.broadcast(self, path.create_path(source_path),
                             sibling_field, new_field_name)
calculate
calculate(
    sources: Sequence[NodeTensor],
    destinations: Sequence[Expression],
    options: Options,
    side_info: Optional[Prensor] = None,
) -> NodeTensor

Calculates the node tensor of the expression.

The node tensor must be a function of the properties of the expression and the node tensors of the expressions from get_source_expressions().

If is_leaf, then calculate must return a LeafNodeTensor. Otherwise, it must return a ChildNodeTensor or RootNodeTensor.

If calculate_is_identity is true, then this must return source_tensors[0].

Sometimes, for operations such as parsing the proto, calculate will return additional information. For example, calculate() for the root of the proto expression also parses out the tensors required to calculate the tensors of the children. This is why destinations are required.

For a reference use, see calculate_value_slowly(...) below.

PARAMETER DESCRIPTION
source_tensors

The node tensors of the expressions in get_source_expressions().

TYPE: Sequence[NodeTensor]

destinations

The expressions that will use the output of this method.

TYPE: Sequence[Expression]

options

Options for the calculation.

TYPE: Options

side_info

An optional prensor that is used to bind to a placeholder expression.

TYPE: Optional[Prensor] DEFAULT: None

RETURNS DESCRIPTION
NodeTensor

A NodeTensor representing the output of this expression.

Source code in struct2tensor/expression_impl/size.py
def calculate(
    self,
    sources: Sequence[prensor.NodeTensor],
    destinations: Sequence[expression.Expression],
    options: calculate_options.Options,
    side_info: Optional[prensor.Prensor] = None) -> prensor.NodeTensor:

  [origin_value, origin_parent_value] = sources
  if not isinstance(origin_value,
                    (prensor.LeafNodeTensor, prensor.ChildNodeTensor)):
    raise ValueError(
        "origin_value must be a LeafNodeTensor or a ChildNodeTensor, "
        "but was a " + str(type(origin_value)))

  if not isinstance(origin_parent_value,
                    (prensor.ChildNodeTensor, prensor.RootNodeTensor)):
    raise ValueError("origin_parent_value must be a ChildNodeTensor "
                     "or a RootNodeTensor, but was a " +
                     str(type(origin_parent_value)))

  parent_index = origin_value.parent_index
  num_parent_protos = origin_parent_value.size
  # A vector of 1s of the same size as the parent_index.
  updates = tf.ones(tf.shape(parent_index), dtype=tf.int64)
  indices = tf.expand_dims(parent_index, 1)
  # This is incrementing the size by 1 for each element.
  # Obviously, not the fastest way to do this.
  values = tf.scatter_nd(indices, updates, tf.reshape(num_parent_protos, [1]))

  # Need to create a new_parent_index = 0,1,2,3,4...n.
  new_parent_index = tf.range(num_parent_protos, dtype=tf.int64)
  return prensor.LeafNodeTensor(new_parent_index, values, False)
calculation_equal
calculation_equal(expr: Expression) -> bool

self.calculate is equal to another expression.calculate.

Given the same source node tensors, self.calculate(...) and expression.calculate(...) will have the same result.

Note that this does not check that the source expressions of the two expressions are the same. Therefore, two operations can have the same calculation, but not the same output, because their sources are different. For example, if a.calculation_is_identity() is True and b.calculation_is_identity() is True, then a.calculation_equal(b) is True. However, unless a and b have the same source, the expressions themselves are not equal.

PARAMETER DESCRIPTION
expression

The expression to compare to.

TYPE: Expression

Source code in struct2tensor/expression_impl/size.py
def calculation_equal(self, expr: expression.Expression) -> bool:
  return isinstance(expr, SizeExpression)
calculation_is_identity
calculation_is_identity() -> bool

True iff the self.calculate is the identity.

There is exactly one source, and the output of self.calculate(...) is the node tensor of this source.

Source code in struct2tensor/expression_impl/size.py
def calculation_is_identity(self) -> bool:
  return False
cogroup_by_index
cogroup_by_index(
    source_path: CoercableToPath,
    left_name: Step,
    right_name: Step,
    new_field_name: Step,
) -> Expression

Creates a cogroup of left_name and right_name at new_field_name.

Source code in struct2tensor/expression.py
def cogroup_by_index(self, source_path: CoercableToPath, left_name: path.Step,
                     right_name: path.Step,
                     new_field_name: path.Step) -> "Expression":
  """Creates a cogroup of left_name and right_name at new_field_name."""
  raise NotImplementedError("cogroup_by_index is not implemented")
create_has_field
create_has_field(
    source_path: CoercableToPath, new_field_name: Step
) -> Expression

Creates a field that is the presence of the source path.

Source code in struct2tensor/expression.py
def create_has_field(self, source_path: CoercableToPath,
                     new_field_name: path.Step) -> "Expression":
  """Creates a field that is the presence of the source path."""
  return size.has(self, path.create_path(source_path), new_field_name)
create_proto_index
create_proto_index(field_name: Step) -> Expression

Creates a proto index field as a direct child of the current root.

The proto index maps each root element to the original batch index. For example: [0, 2] means the first element came from the first proto in the original input tensor and the second element came from the third proto. The created field is always "dense" -- it has the same valency as the current root.

PARAMETER DESCRIPTION
field_name

the name of the field to be created.

TYPE: Step

RETURNS DESCRIPTION
Expression

An Expression object representing the result of the operation.

Source code in struct2tensor/expression.py
def create_proto_index(self, field_name: path.Step) -> "Expression":
  """Creates a proto index field as a direct child of the current root.

  The proto index maps each root element to the original batch index.
  For example: [0, 2] means the first element came from the first proto
  in the original input tensor and the second element came from the third
  proto. The created field is always "dense" -- it has the same valency as
  the current root.

  Args:
    field_name: the name of the field to be created.

  Returns:
    An Expression object representing the result of the operation.
  """

  return reroot.create_proto_index_field(self, field_name)
create_size_field
create_size_field(
    source_path: CoercableToPath, new_field_name: Step
) -> Expression

Creates a field that is the size of the source path.

Source code in struct2tensor/expression.py
def create_size_field(self, source_path: CoercableToPath,
                      new_field_name: path.Step) -> "Expression":
  """Creates a field that is the size of the source path."""
  return size.size(self, path.create_path(source_path), new_field_name)
get_child
get_child(field_name: Step) -> Optional[Expression]

Gets a named child.

Source code in struct2tensor/expression.py
def get_child(self, field_name: path.Step) -> Optional["Expression"]:
  """Gets a named child."""
  if field_name in self._child_cache:
    return self._child_cache[field_name]
  result = self._get_child_impl(field_name)
  self._child_cache[field_name] = result
  return result
get_child_or_error
get_child_or_error(field_name: Step) -> Expression

Gets a named child.

Source code in struct2tensor/expression.py
def get_child_or_error(self, field_name: path.Step) -> "Expression":
  """Gets a named child."""
  result = self.get_child(field_name)
  if result is None:
    raise KeyError("No such field: {}".format(field_name))
  return result
get_descendant
get_descendant(p: Path) -> Optional[Expression]

Finds the descendant at the path.

Source code in struct2tensor/expression.py
def get_descendant(self, p: path.Path) -> Optional["Expression"]:
  """Finds the descendant at the path."""
  result = self
  for field_name in p.field_list:
    result = result.get_child(field_name)
    if result is None:
      return None
  return result
get_descendant_or_error
get_descendant_or_error(p: Path) -> Expression

Finds the descendant at the path.

Source code in struct2tensor/expression.py
def get_descendant_or_error(self, p: path.Path) -> "Expression":
  """Finds the descendant at the path."""
  result = self.get_descendant(p)
  if result is None:
    raise ValueError("Missing path: {} in {}".format(
        str(p), self.schema_string(limit=20)))
  return result
get_known_children
get_known_children() -> Mapping[Step, Expression]
Source code in struct2tensor/expression.py
def get_known_children(self) -> Mapping[path.Step, "Expression"]:
  known_field_names = self.known_field_names()
  result = {}
  for name in known_field_names:
    result[name] = self.get_child_or_error(name)
  return result
get_known_descendants
get_known_descendants() -> Mapping[Path, Expression]

Gets a mapping from known paths to subexpressions.

The difference between this and get_descendants in Prensor is that all paths in a Prensor are realized, thus all known. But an Expression's descendants might not all be known at the point this method is called, because an expression may have an infinite number of children.

RETURNS DESCRIPTION
Mapping[Path, Expression]

A mapping from paths (relative to the root of the subexpression) to expressions.

Source code in struct2tensor/expression.py
def get_known_descendants(self) -> Mapping[path.Path, "Expression"]:
  # Rename get_known_descendants
  """Gets a mapping from known paths to subexpressions.

  The difference between this and get_descendants in Prensor is that
  all paths in a Prensor are realized, thus all known. But an Expression's
  descendants might not all be known at the point this method is called,
  because an expression may have an infinite number of children.

  Returns:
    A mapping from paths (relative to the root of the subexpression) to
      expressions.
  """
  known_subexpressions = {
      k: v.get_known_descendants()
      for k, v in self.get_known_children().items()
  }
  result = {}
  for field_name, subexpression in known_subexpressions.items():
    subexpression_path = path.Path(
        [field_name], validate_step_format=self.validate_step_format
    )
    for p, expr in subexpression.items():
      result[subexpression_path.concat(p)] = expr
  result[path.Path([], validate_step_format=self.validate_step_format)] = self
  return result
get_paths_with_schema
get_paths_with_schema() -> List[Path]

Extract only paths that contain schema information.

Source code in struct2tensor/expression.py
def get_paths_with_schema(self) -> List[path.Path]:
  """Extract only paths that contain schema information."""
  result = []
  for name, child in self.get_known_children().items():
    if child.schema_feature is None:
      continue
    result.extend(
        [
            path.Path(
                [name], validate_step_format=self.validate_step_format
            ).concat(x)
            for x in child.get_paths_with_schema()
        ]
    )
  # Note: We always take the root path and so will return an empty schema
  # if there is no schema information on any nodes, including the root.
  if not result:
    result.append(
        path.Path([], validate_step_format=self.validate_step_format)
    )
  return result
get_schema
get_schema(create_schema_features=True) -> Schema

Returns a schema for the entire tree.

PARAMETER DESCRIPTION
create_schema_features

If True, schema features are added for all children and a schema entry is created if not available on the child. If False, features are left off of the returned schema if there is no schema_feature on the child.

DEFAULT: True

Source code in struct2tensor/expression.py
def get_schema(self, create_schema_features=True) -> schema_pb2.Schema:
  """Returns a schema for the entire tree.

  Args:
    create_schema_features: If True, schema features are added for all
      children and a schema entry is created if not available on the child. If
      False, features are left off of the returned schema if there is no
      schema_feature on the child.
  """
  if not create_schema_features:
    return self.project(self.get_paths_with_schema()).get_schema()
  result = schema_pb2.Schema()
  self._populate_schema_feature_children(result.feature)
  return result
get_source_expressions
get_source_expressions() -> Sequence[Expression]

Gets the sources of this expression.

The node tensors of the source expressions must be sufficient to calculate the node tensor of this expression (see calculate and calculate_value_slowly).

RETURNS DESCRIPTION
Sequence[Expression]

The sources of this expression.

Source code in struct2tensor/expression_impl/size.py
def get_source_expressions(self) -> Sequence[expression.Expression]:
  return [self._origin, self._origin_parent]
known_field_names
known_field_names() -> FrozenSet[Step]

Returns known field names of the expression.

TODO(martinz): implement set_field and project. Known field names of a parsed proto correspond to the fields declared in the message. Examples of "unknown" fields are extensions and explicit casts in an any field. The only way to know if an unknown field "(foo.bar)" is present in an expression expr is to call (expr["(foo.bar)"] is not None).

Notice that simply accessing a field does not make it "known". However, setting a field (or setting a descendant of a field) will make it known.

project(...) returns an expression where the known field names are the only field names. In general, if you want to depend upon known_field_names (e.g., if you want to compile a expression), then the best approach is to project() the expression first.

RETURNS DESCRIPTION
FrozenSet[Step]

An immutable set of field names.

Source code in struct2tensor/expression.py
def known_field_names(self) -> FrozenSet[path.Step]:
  return frozenset()
map_field_values
map_field_values(
    source_path: CoercableToPath,
    operator: Callable[[Tensor], Tensor],
    dtype: DType,
    new_field_name: Step,
) -> Expression

Map a primitive field to create a new primitive field.

Note

The dtype argument is added since the v1 API.

PARAMETER DESCRIPTION
source_path

the origin path.

TYPE: CoercableToPath

operator

an element-wise operator that takes a 1-dimensional vector.

TYPE: Callable[[Tensor], Tensor]

dtype

the type of the output.

TYPE: DType

new_field_name

the name of a new sibling of source_path.

TYPE: Step

RETURNS DESCRIPTION
Expression

the resulting root expression.

Source code in struct2tensor/expression.py
def map_field_values(self, source_path: CoercableToPath,
                     operator: Callable[[tf.Tensor], tf.Tensor],
                     dtype: tf.DType,
                     new_field_name: path.Step) -> "Expression":
  """Map a primitive field to create a new primitive field.

  !!! Note
      The dtype argument is added since the v1 API.

  Args:
    source_path: the origin path.
    operator: an element-wise operator that takes a 1-dimensional vector.
    dtype: the type of the output.
    new_field_name: the name of a new sibling of source_path.

  Returns:
    the resulting root expression.
  """
  return map_values.map_values(self, path.create_path(source_path), operator,
                               dtype, new_field_name)
map_ragged_tensors
map_ragged_tensors(
    parent_path: CoercableToPath,
    source_fields: Sequence[Step],
    operator: Callable[..., SparseTensor],
    is_repeated: bool,
    dtype: DType,
    new_field_name: Step,
) -> Expression

Maps a set of primitive fields of a message to a new field.

Unlike map_field_values, this operation allows you to some degree reshape the field. For instance, you can take two optional fields and create a repeated field, or perform a reduce_sum on the last dimension of a repeated field and create an optional field. The key constraint is that the operator must return a sparse tensor of the correct dimension: i.e., a 2D sparse tensor if is_repeated is true, or a 1D sparse tensor if is_repeated is false. Moreover, the first dimension of the sparse tensor must be equal to the first dimension of the input tensor.

PARAMETER DESCRIPTION
parent_path

the parent of the input and output fields.

TYPE: CoercableToPath

source_fields

the nonempty list of names of the source fields.

TYPE: Sequence[Step]

operator

an operator that takes len(source_fields) sparse tensors and returns a sparse tensor of the appropriate shape.

TYPE: Callable[..., SparseTensor]

is_repeated

whether the output is repeated.

TYPE: bool

dtype

the dtype of the result.

TYPE: DType

new_field_name

the name of the resulting field.

TYPE: Step

RETURNS DESCRIPTION
Expression

A new query.

Source code in struct2tensor/expression.py
def map_ragged_tensors(self, parent_path: CoercableToPath,
                       source_fields: Sequence[path.Step],
                       operator: Callable[..., tf.SparseTensor],
                       is_repeated: bool, dtype: tf.DType,
                       new_field_name: path.Step) -> "Expression":
  """Maps a set of primitive fields of a message to a new field.

  Unlike map_field_values, this operation allows you to some degree reshape
  the field. For instance, you can take two optional fields and create a
  repeated field, or perform a reduce_sum on the last dimension of a repeated
  field and create an optional field. The key constraint is that the operator
  must return a sparse tensor of the correct dimension: i.e., a
  2D sparse tensor if is_repeated is true, or a 1D sparse tensor if
  is_repeated is false. Moreover, the first dimension of the sparse tensor
  must be equal to the first dimension of the input tensor.

  Args:
    parent_path: the parent of the input and output fields.
    source_fields: the nonempty list of names of the source fields.
    operator: an operator that takes len(source_fields) sparse tensors and
      returns a sparse tensor of the appropriate shape.
    is_repeated: whether the output is repeated.
    dtype: the dtype of the result.
    new_field_name: the name of the resulting field.

  Returns:
    A new query.
  """
  return map_prensor.map_ragged_tensor(
      self,
      path.create_path(parent_path),
      [
          path.Path([f], validate_step_format=self.validate_step_format)
          for f in source_fields
      ],
      operator,
      is_repeated,
      dtype,
      new_field_name,
  )
map_sparse_tensors
map_sparse_tensors(
    parent_path: CoercableToPath,
    source_fields: Sequence[Step],
    operator: Callable[..., SparseTensor],
    is_repeated: bool,
    dtype: DType,
    new_field_name: Step,
) -> Expression

Maps a set of primitive fields of a message to a new field.

Unlike map_field_values, this operation allows you to some degree reshape the field. For instance, you can take two optional fields and create a repeated field, or perform a reduce_sum on the last dimension of a repeated field and create an optional field. The key constraint is that the operator must return a sparse tensor of the correct dimension: i.e., a 2D sparse tensor if is_repeated is true, or a 1D sparse tensor if is_repeated is false. Moreover, the first dimension of the sparse tensor must be equal to the first dimension of the input tensor.

PARAMETER DESCRIPTION
parent_path

the parent of the input and output fields.

TYPE: CoercableToPath

source_fields

the nonempty list of names of the source fields.

TYPE: Sequence[Step]

operator

an operator that takes len(source_fields) sparse tensors and returns a sparse tensor of the appropriate shape.

TYPE: Callable[..., SparseTensor]

is_repeated

whether the output is repeated.

TYPE: bool

dtype

the dtype of the result.

TYPE: DType

new_field_name

the name of the resulting field.

TYPE: Step

RETURNS DESCRIPTION
Expression

A new query.

Source code in struct2tensor/expression.py
def map_sparse_tensors(self, parent_path: CoercableToPath,
                       source_fields: Sequence[path.Step],
                       operator: Callable[..., tf.SparseTensor],
                       is_repeated: bool, dtype: tf.DType,
                       new_field_name: path.Step) -> "Expression":
  """Maps a set of primitive fields of a message to a new field.

  Unlike map_field_values, this operation allows you to some degree reshape
  the field. For instance, you can take two optional fields and create a
  repeated field, or perform a reduce_sum on the last dimension of a repeated
  field and create an optional field. The key constraint is that the operator
  must return a sparse tensor of the correct dimension: i.e., a
  2D sparse tensor if is_repeated is true, or a 1D sparse tensor if
  is_repeated is false. Moreover, the first dimension of the sparse tensor
  must be equal to the first dimension of the input tensor.

  Args:
    parent_path: the parent of the input and output fields.
    source_fields: the nonempty list of names of the source fields.
    operator: an operator that takes len(source_fields) sparse tensors and
      returns a sparse tensor of the appropriate shape.
    is_repeated: whether the output is repeated.
    dtype: the dtype of the result.
    new_field_name: the name of the resulting field.

  Returns:
    A new query.
  """
  return map_prensor.map_sparse_tensor(
      self,
      path.create_path(parent_path),
      [
          path.Path([f], validate_step_format=self.validate_step_format)
          for f in source_fields
      ],
      operator,
      is_repeated,
      dtype,
      new_field_name,
  )
project
project(path_list: Sequence[CoercableToPath]) -> Expression

Constrains the paths to those listed.

Source code in struct2tensor/expression.py
def project(self, path_list: Sequence[CoercableToPath]) -> "Expression":
  """Constrains the paths to those listed."""
  return project.project(self, [path.create_path(x) for x in path_list])
promote
promote(source_path: CoercableToPath, new_field_name: Step)

Promotes source_path to be a field new_field_name in its grandparent.

Source code in struct2tensor/expression.py
def promote(self, source_path: CoercableToPath, new_field_name: path.Step):
  """Promotes source_path to be a field new_field_name in its grandparent."""
  return promote.promote(self, path.create_path(source_path), new_field_name)
promote_and_broadcast
promote_and_broadcast(
    path_dictionary: Mapping[Step, CoercableToPath],
    dest_path_parent: CoercableToPath,
) -> Expression
Source code in struct2tensor/expression.py
def promote_and_broadcast(
    self, path_dictionary: Mapping[path.Step, CoercableToPath],
    dest_path_parent: CoercableToPath) -> "Expression":
  return promote_and_broadcast.promote_and_broadcast(
      self, {k: path.create_path(v) for k, v in path_dictionary.items()},
      path.create_path(dest_path_parent))
reroot
reroot(new_root: CoercableToPath) -> Expression

Returns a new list of protocol buffers available at new_root.

Source code in struct2tensor/expression.py
def reroot(self, new_root: CoercableToPath) -> "Expression":
  """Returns a new list of protocol buffers available at new_root."""
  return reroot.reroot(self, path.create_path(new_root))
schema_string
schema_string(limit: Optional[int] = None) -> str

Returns a schema for the expression.

For examle,

repeated root:
  optional int32 foo
  optional bar:
    optional string baz
  optional int64 bak

Note that unknown fields and subexpressions are not displayed.

PARAMETER DESCRIPTION
limit

if present, limit the recursion.

TYPE: Optional[int] DEFAULT: None

RETURNS DESCRIPTION
str

A string, describing (a part of) the schema.

Source code in struct2tensor/expression.py
def schema_string(self, limit: Optional[int] = None) -> str:
  """Returns a schema for the expression.

  For examle,
  ```
  repeated root:
    optional int32 foo
    optional bar:
      optional string baz
    optional int64 bak
  ```

  Note that unknown fields and subexpressions are not displayed.

  Args:
    limit: if present, limit the recursion.

  Returns:
    A string, describing (a part of) the schema.
  """
  return "\n".join(self._schema_string_helper("root", limit))
slice
slice(
    source_path: CoercableToPath,
    new_field_name: Step,
    begin: Optional[IndexValue] = None,
    end: Optional[IndexValue] = None,
) -> Expression

Creates a slice copy of source_path at new_field_path.

Note that if begin or end is negative, it is considered relative to the size of the array. e.g., slice(...,begin=-1) will get the last element of every array.

PARAMETER DESCRIPTION
source_path

the source of the slice.

TYPE: CoercableToPath

new_field_name

the new field that is generated.

TYPE: Step

begin

the beginning of the slice (inclusive).

TYPE: Optional[IndexValue] DEFAULT: None

end

the end of the slice (exclusive).

TYPE: Optional[IndexValue] DEFAULT: None

RETURNS DESCRIPTION
Expression

An Expression object representing the result of the operation.

Source code in struct2tensor/expression.py
def slice(self,
          source_path: CoercableToPath,
          new_field_name: path.Step,
          begin: Optional[IndexValue] = None,
          end: Optional[IndexValue] = None) -> "Expression":
  """Creates a slice copy of source_path at new_field_path.

  Note that if begin or end is negative, it is considered relative to
  the size of the array. e.g., slice(...,begin=-1) will get the last
  element of every array.

  Args:
    source_path: the source of the slice.
    new_field_name: the new field that is generated.
    begin: the beginning of the slice (inclusive).
    end: the end of the slice (exclusive).

  Returns:
    An Expression object representing the result of the operation.
  """
  return slice_expression.slice_expression(self,
                                           path.create_path(source_path),
                                           new_field_name, begin, end)
truncate
truncate(
    source_path: CoercableToPath,
    limit: Union[int, Tensor],
    new_field_name: Step,
) -> Expression

Creates a truncated copy of source_path at new_field_path.

Source code in struct2tensor/expression.py
def truncate(self, source_path: CoercableToPath, limit: Union[int, tf.Tensor],
             new_field_name: path.Step) -> "Expression":
  """Creates a truncated copy of source_path at new_field_path."""
  return self.slice(source_path, new_field_name, end=limit)
Functions
has
has(
    root: Expression,
    source_path: Path,
    new_field_name: Step,
) -> Expression

Get the has of a field as a new sibling field.

PARAMETER DESCRIPTION
root

the original expression.

TYPE: Expression

source_path

the source path to measure. Cannot be root.

TYPE: Path

new_field_name

the name of the sibling field.

TYPE: Step

RETURNS DESCRIPTION
Expression

The new expression.

Source code in struct2tensor/expression_impl/size.py
def has(root: expression.Expression, source_path: path.Path,
        new_field_name: path.Step) -> expression.Expression:
  """Get the has of a field as a new sibling field.

  Args:
    root: the original expression.
    source_path: the source path to measure. Cannot be root.
    new_field_name: the name of the sibling field.

  Returns:
    The new expression.
  """
  new_root, size_p = size_anonymous(root, source_path)
  # TODO(martinz): consider using copy_over to "remove" the size field
  # from the result.
  return map_values.map_values(
      new_root, size_p, lambda x: tf.greater(x, tf.constant(0, dtype=tf.int64)),
      tf.bool, new_field_name)
size
size(
    root: Expression,
    source_path: Path,
    new_field_name: Step,
) -> Expression

Get the size of a field as a new sibling field.

PARAMETER DESCRIPTION
root

the original expression.

TYPE: Expression

source_path

the source path to measure. Cannot be root.

TYPE: Path

new_field_name

the name of the sibling field.

TYPE: Step

RETURNS DESCRIPTION
Expression

The new expression.

Source code in struct2tensor/expression_impl/size.py
def size(root: expression.Expression, source_path: path.Path,
         new_field_name: path.Step) -> expression.Expression:
  """Get the size of a field as a new sibling field.

  Args:
    root: the original expression.
    source_path: the source path to measure. Cannot be root.
    new_field_name: the name of the sibling field.

  Returns:
    The new expression.
  """
  return _size_impl(root, source_path, new_field_name)[0]
size_anonymous
size_anonymous(
    root: Expression, source_path: Path
) -> Tuple[Expression, Path]

Calculate the size of a field, and store it as an anonymous sibling.

PARAMETER DESCRIPTION
root

the original expression.

TYPE: Expression

source_path

the source path to measure. Cannot be root.

TYPE: Path

RETURNS DESCRIPTION
Tuple[Expression, Path]

The new expression and the new field as a pair.

Source code in struct2tensor/expression_impl/size.py
def size_anonymous(root: expression.Expression, source_path: path.Path
                  ) -> Tuple[expression.Expression, path.Path]:
  """Calculate the size of a field, and store it as an anonymous sibling.

  Args:
    root: the original expression.
    source_path: the source path to measure. Cannot be root.

  Returns:
    The new expression and the new field as a pair.
  """
  return _size_impl(root, source_path, path.get_anonymous_field())
Modules

slice_expression

Implementation of slice.

The slice operation is meant to replicate the slicing of a list in python.

Slicing a list in python is done by specifying a beginning and ending. The resulting list consists of all elements in the range.

For example:

>>> x = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
>>> print(x[2:5]) # all elements between index 2 inclusive and index 5 exclusive
['c', 'd', 'e']
>>> print(x[2:]) # all elements between index 2 and the end.
['c', 'd', 'e', 'f', 'g']
>>> print(x[:4]) # all elements between the beginning and index 4 (exclusive).
['a', 'b', 'c', 'd']
>>> print(x[-3:-1]) # all elements starting three from the end.
>>>                 # until one from the end (exclusive).
['e', 'f']
>>> print(x[-3:6]) # all elements starting three from the end
                   # until index 6 exclusive.
['e', 'f', 'g']

TODO(martinz): there is a third argument to slice, which allows one to step over the elements (e.g. x[2:6:2]=['c', 'e'], giving you every other element. This is not implemented.

A prensor can be considered to be interleaved lists and dictionaries. E.g.:

my_expression = [{
  "foo":[
    {"bar":[
      {"baz":["a","b","c", "d"]},
      {"baz":["d","e","f"]}
      ]
    },
    {"bar":[
      {"baz":["g","h","i"]},
      {"baz":["j","k","l", ]}
      {"baz":["m"]}
    ]
    }]
}]
result_1 = slice_expression.slice_expression(
  my_expression, "foo.bar", "new_bar",begin=1, end=3)

result_1 = [{
  "foo":[
    {"bar":[
      {"baz":["a","b","c", "d"]},
      {"baz":["d","e","f"]}
      ],
     "new_bar":[
      {"baz":["d","e","f"]}
      ]
    },
    {"bar":[
      {"baz":["g","h","i"]},
      {"baz":["j","k","l", ]}
      {"baz":["m", ]}
     ],
    "new_bar":[
      {"baz":["j","k","l", ]}
      {"baz":["m", ]}
    ]
    }]
}]
result_2 = slice_expression.slice_expression(
  my_expression, "foo.bar.baz", "new_baz",begin=1, end=3)

result_2 = [{
  "foo":[
    {"bar":[
      {"baz":["a","b","c", "d"],
       "new_baz":["b","c"],
      },
      {"baz":["d","e","f"], "new_baz":["e","f"]}
      ]
    },
    {"bar":[
      {"baz":["g","h","i"], "new_baz":["h","i"]},
      {"baz":["j","k","l"], "new_baz":["k","l"]},
      {"baz":["m", ]}
      ]
    }]
}]
Attributes
IndexValue module-attribute
IndexValue = IndexValue
Functions
slice_expression
slice_expression(
    expr: Expression,
    p: Path,
    new_field_name: Step,
    begin: Optional[IndexValue],
    end: Optional[IndexValue],
) -> Expression

Creates a new subtree with a sliced expression.

This follows the pattern of python slice() method. See module-level comments for examples.

PARAMETER DESCRIPTION
expr

the original root expression

TYPE: Expression

p

the path to the source to be sliced.

TYPE: Path

new_field_name

the name of the new subtree.

TYPE: Step

begin

beginning index

TYPE: Optional[IndexValue]

end

end index.

TYPE: Optional[IndexValue]

RETURNS DESCRIPTION
Expression

A new root expression.

Source code in struct2tensor/expression_impl/slice_expression.py
def slice_expression(expr: expression.Expression, p: path.Path,
                     new_field_name: path.Step, begin: Optional[IndexValue],
                     end: Optional[IndexValue]) -> expression.Expression:
  """Creates a new subtree with a sliced expression.

  This follows the pattern of python slice() method.
  See module-level comments for examples.

  Args:
    expr: the original root expression
    p: the path to the source to be sliced.
    new_field_name: the name of the new subtree.
    begin: beginning index
    end: end index.

  Returns:
    A new root expression.
  """
  work_expr, mask_anonymous_path = _get_slice_mask(expr, p, begin, end)
  work_expr = filter_expression.filter_by_sibling(
      work_expr, p, mask_anonymous_path.field_list[-1], new_field_name)
  new_path = p.get_parent().get_child(new_field_name)
  # We created a lot of anonymous fields and intermediate expressions. Just grab
  # the final result (and its children).
  return expression_add.add_to(expr, {new_path: work_expr})
Modules