`expression_impl`¶

struct2tensor.expression_impl ¶

Import all modules in expression_impl.

The modules in this file should be accessed like the following:

import struct2tensor as s2t
from struct2tensor import expression_impl

s2t.expression_impl.apply_schema

MODULE	DESCRIPTION
`apply_schema`	Apply a schema to an expression.
`broadcast`	Methods for broadcasting a path in a tree.
`depth_limit`	Caps the depth of an expression.
`filter_expression`	Create a new expression that is a filtered version of an original one.
`index`	get_positional_index and get_index_from_end methods.
`map_prensor`	Arbitrary operations from sparse and ragged tensors to a leaf field.
`map_prensor_to_prensor`	Arbitrary operations from prensors to prensors in an expression.
`map_values`	Maps the values of various leaves of the same child to a single result.
`parquet`	Apache Parquet Dataset.
`parse_message_level_ex`	Parses regular fields, extensions, any casts, and map protos.
`placeholder`	Placeholder expression.
`project`	project selects a subtree of an expression.
`promote`	Promote an expression to be a child of its grandparent.
`promote_and_broadcast`	promote_and_broadcast a set of nodes.
`proto`	Expressions to parse a proto.
`reroot`	Reroot to a subtree, maintaining an input proto index.
`size`	Functions for creating new size or has expression.
`slice_expression`	Implementation of slice.

Modules¶

apply_schema ¶

Apply a schema to an expression.

A tensorflow metadata schema (TODO(martinz): link) represents more detailed information about the data: specifically, it presents domain information (e.g., not just integers, but integers between 0 and 10), and more detailed structural information (e.g., this field occurs in at least 70% of its parents, and when it occurs, it shows up 5 to 7 times).

Applying a schema attaches a tensorflow metadata schema to an expression: namely, it aligns the features in the schema with the expression's children by name (possibly recursively).

After applying a schema to an expression, one can use promote, broadcast, et cetera, and the schema for new expressions will be inferred. If you write a custom expression, you can write code that determines the schema information of the result.

To get the schema back, call get_schema().

This does not filter out fields not in the schema.

my_expr = ...
my_schema = # ...schema here...
my_new_schema = my_expr.apply_schema(my_schema).get_schema()
# my_new_schema has semantically identical information on the fields as my_schema.

TODO(martinz): Add utilities to:

Get the (non-deprecated) paths from a schema.
Check if any paths in the schema are not in the expression.
Check if any paths in the expression are not in the schema.
Project the expression to paths in the schema.

FUNCTION	DESCRIPTION
`apply_schema`

Functions¶

apply_schema ¶

apply_schema(
    expr: Expression, schema: Schema
) -> Expression

Source code in struct2tensor/expression_impl/apply_schema.py

def apply_schema(expr: expression.Expression,
                 schema: schema_pb2.Schema) -> expression.Expression:
  schema_copy = schema_pb2.Schema()
  schema_copy.CopyFrom(schema)
  for x in schema_copy.feature:
    _normalize_feature(x, schema_copy)
  return _SchemaExpression(expr, schema_copy.feature, None)

Modules¶

broadcast ¶

Methods for broadcasting a path in a tree.

This provides methods for broadcasting a field anonymously (that is used in promote_and_broadcast), or with an explicitly given name.

Suppose you have an expr representing:

+
|
+-session*   (stars indicate repeated)
     |
     +-event*
     |
     +-val*-int64

session: {
  event: {}
  event: {}
  val: 10
  val: 11
}
session: {
  event: {}
  event: {}
  val: 20
}

Then:

broadcast.broadcast(expr, path.Path(["session","val"]), "event", "nv")

becomes:

+
|
+---session*   (stars indicate repeated)
       |
       +-event*
       |   |
       |   +---nv*-int64
       |
       +-val*-int64

session: {
  event: {
    nv: 10
    nv:11
  }
  event: {
    nv: 10
    nv:11
  }
  val: 10
  val: 11
}
session: {
  event: {nv: 20}
  event: {nv: 20}
  val: 20
}

FUNCTION	DESCRIPTION
`broadcast`
`broadcast_anonymous`

Functions¶

broadcast ¶

broadcast(
    root: Expression,
    origin: Path,
    sibling_name: Step,
    new_field_name: Step,
) -> Expression

Source code in struct2tensor/expression_impl/broadcast.py

def broadcast(root: expression.Expression, origin: path.Path,
              sibling_name: path.Step,
              new_field_name: path.Step) -> expression.Expression:
  return _broadcast_impl(root, origin, sibling_name, new_field_name)[0]

broadcast_anonymous ¶

broadcast_anonymous(
    root: Expression, origin: Path, sibling: Step
) -> Tuple[Expression, Path]

Source code in struct2tensor/expression_impl/broadcast.py

def broadcast_anonymous(
    root: expression.Expression, origin: path.Path,
    sibling: path.Step) -> Tuple[expression.Expression, path.Path]:
  return _broadcast_impl(root, origin, sibling, path.get_anonymous_field())

Modules¶

depth_limit ¶

Caps the depth of an expression.

Suppose you have an expression expr modeled as:

if expr_2 = depth_limit.limit_depth(expr, 2) You get:

  *
   \
    A
   / \
  D   B

FUNCTION	DESCRIPTION
`limit_depth`	Limit the depth to nodes k steps from expr.

Functions¶

limit_depth ¶

limit_depth(
    expr: Expression, depth_limit: int
) -> Expression

Limit the depth to nodes k steps from expr.

Source code in struct2tensor/expression_impl/depth_limit.py

def limit_depth(expr: expression.Expression,
                depth_limit: int) -> expression.Expression:
  """Limit the depth to nodes k steps from expr."""
  return _DepthLimitExpression(expr, depth_limit)

Modules¶

filter_expression ¶

Create a new expression that is a filtered version of an original one.

There are two public methods in this module: filter_by_sibling and filter_by_child. As with most other operations, these create a new tree which has all the original paths of the original tree, but with a new subtree.

filter_by_sibling allows you to filter an expression by a boolean sibling field.

Beginning with the struct:

root =
         -----*----------------------------------------------------
        /                       \                                  \
     root0                    root1-----------------------      root2 (empty)
      /   \                   /    \               \      \
      |  keep_my_sib0:False  |  keep_my_sib1:True   | keep_my_sib2:False
    doc0-----               doc1---------------    doc2--------
     |       \                \           \    \               \
    bar:"a"  keep_me:False    bar:"b" bar:"c" keep_me:True      bar:"d"

# Note, keep_my_sib and doc must have the same shape (e.g., each root
has the same number of keep_my_sib children as doc children).
root_2 = filter_expression.filter_by_sibling(
    root, path.create_path("doc"), "keep_my_sib", "new_doc")

End with the struct (suppressing original doc):
         -----*----------------------------------------------------
        /                       \                                  \
    root0                    root1------------------        root2 (empty)
        \                   /    \                  \
        keep_my_sib0:False  |  keep_my_sib1:True   keep_my_sib2:False
                           new_doc0-----------
                             \           \    \
                             bar:"b" bar:"c" keep_me:True

filter_by_sibling allows you to filter an expression by a optional boolean child field.

The following call will have the same effect as above:

root_2 = filter_expression.filter_by_child(
    root, path.create_path("doc"), "keep_me", "new_doc")

FUNCTION	DESCRIPTION
`filter_by_child`	Filter an expression by an optional boolean child field.
`filter_by_sibling`	Filter an expression by its sibling.

Functions¶

filter_by_child ¶

filter_by_child(
    expr: Expression,
    p: Path,
    child_field_name: Step,
    new_field_name: Step,
) -> Expression

Filter an expression by an optional boolean child field.

If the child field is present and True, then keep that parent. Otherwise, drop the parent.

PARAMETER	DESCRIPTION
`expr`	the original expression TYPE: `Expression`
`p`	the path to filter. TYPE: `Path`
`child_field_name`	the boolean child field to use to filter. TYPE: `Step`
`new_field_name`	the new, filtered version of path. TYPE: `Step`

RETURNS	DESCRIPTION
`Expression`	The new root expression.

Source code in struct2tensor/expression_impl/filter_expression.py

def filter_by_child(expr: expression.Expression, p: path.Path,
                    child_field_name: path.Step,
                    new_field_name: path.Step) -> expression.Expression:
  """Filter an expression by an optional boolean child field.

  If the child field is present and True, then keep that parent.
  Otherwise, drop the parent.

  Args:
    expr: the original expression
    p: the path to filter.
    child_field_name: the boolean child field to use to filter.
    new_field_name: the new, filtered version of path.

  Returns:
    The new root expression.
  """
  origin = expr.get_descendant_or_error(p)
  child = origin.get_child_or_error(child_field_name)
  new_expr = _FilterByChildExpression(origin, child)
  new_path = p.get_parent().get_child(new_field_name)

  return expression_add.add_paths(expr, {new_path: new_expr})

filter_by_sibling ¶

filter_by_sibling(
    expr: Expression,
    p: Path,
    sibling_field_name: Step,
    new_field_name: Step,
) -> Expression

Filter an expression by its sibling.

This is similar to boolean_mask. The shape of the path being filtered and the sibling must be identical (e.g., each parent object must have an equal number of source and sibling children).

PARAMETER	DESCRIPTION
`expr`	the root expression. TYPE: `Expression`
`p`	a path to the source to be filtered. TYPE: `Path`
`sibling_field_name`	the sibling to use as a mask. TYPE: `Step`
`new_field_name`	a new sibling to create. TYPE: `Step`

RETURNS	DESCRIPTION
`Expression`	a new root.

Source code in struct2tensor/expression_impl/filter_expression.py

def filter_by_sibling(expr: expression.Expression, p: path.Path,
                      sibling_field_name: path.Step,
                      new_field_name: path.Step) -> expression.Expression:
  """Filter an expression by its sibling.


  This is similar to boolean_mask. The shape of the path being filtered and
  the sibling must be identical (e.g., each parent object must have an
  equal number of source and sibling children).

  Args:
    expr: the root expression.
    p: a path to the source to be filtered.
    sibling_field_name: the sibling to use as a mask.
    new_field_name: a new sibling to create.

  Returns:
    a new root.
  """
  origin = expr.get_descendant_or_error(p)
  parent_path = p.get_parent()
  sibling = expr.get_descendant_or_error(
      parent_path.get_child(sibling_field_name))
  new_expr = _FilterBySiblingExpression(origin, sibling)
  new_path = parent_path.get_child(new_field_name)
  return expression_add.add_paths(expr, {new_path: new_expr})

Modules¶

index ¶

get_positional_index and get_index_from_end methods.

The parent_index identifies the index of the parent of each element. These methods take the parent_index to determine the relationship with respect to other elements.

Given:

session: {
  event: {
    val: 111
  }
  event: {
    val: 121
    val: 122
  }
}

session: {
  event: {
    val: 10
    val: 7
  }
  event: {
    val: 1
  }
}

get_positional_index(expr, path.Path(["event","val"]), "val_index")

yields:

session: {
  event: {
    val: 111
    val_index: 0
  }
  event: {
    val: 121
    val: 122
    val_index: 0
    val_index: 1
  }
}

session: {
  event: {
    val: 10
    val: 7
    val_index: 0
    val_index: 1
  }
  event: {
    val: 1
    val_index: 0
  }
}

get_index_from_end(expr, path.Path(["event","val"]), "neg_val_index")

yields:

session: {
  event: {
    val: 111
    neg_val_index: -1
  }
  event: {
    val: 121
    val: 122
    neg_val_index: -2
    neg_val_index: -1
  }
}

session: {
  event: {
    val: 10
    val: 7
    neg_val_index: 2
    neg_val_index: -1
  }
  event: {
    val: 1
    neg_val_index: -1
  }
}

These methods are useful when you want to depend upon the index of a field. For example, if you want to filter examples based upon their index, or cogroup two fields by index, then first creating the index is useful.

Note that while the parent indices of these fields seem like overhead, they are just references to the parent indices of other fields, and are therefore take little memory or CPU.

FUNCTION	DESCRIPTION
`get_index_from_end`	Gets the number of steps from the end of the array.
`get_positional_index`	Gets the positional index.

Functions¶

get_index_from_end ¶

get_index_from_end(
    t: Expression, source_path: Path, new_field_name: Step
) -> Tuple[Expression, Path]

Gets the number of steps from the end of the array.

Given an array ["a", "b", "c"], with indices [0, 1, 2], the result of this is [-3,-2,-1].

PARAMETER	DESCRIPTION
`t`	original expression TYPE: `Expression`
`source_path`	path in expression to get index of. TYPE: `Path`
`new_field_name`	the name of the new field. TYPE: `Step`

RETURNS	DESCRIPTION
`Tuple[Expression, Path]`	The new expression and the new path as a pair.

Source code in struct2tensor/expression_impl/index.py

def get_index_from_end(t: expression.Expression, source_path: path.Path,
                       new_field_name: path.Step
                      ) -> Tuple[expression.Expression, path.Path]:
  """Gets the number of steps from the end of the array.

  Given an array ["a", "b", "c"], with indices [0, 1, 2], the result of this
  is [-3,-2,-1].

  Args:
    t: original expression
    source_path: path in expression to get index of.
    new_field_name: the name of the new field.

  Returns:
    The new expression and the new path as a pair.
  """
  new_path = source_path.get_parent().get_child(new_field_name)
  work_expr, positional_index_path = get_positional_index(
      t, source_path, path.get_anonymous_field())
  work_expr, size_path = size.size_anonymous(work_expr, source_path)
  work_expr = expression_add.add_paths(
      work_expr, {
          new_path:
              _PositionalIndexFromEndExpression(
                  work_expr.get_descendant_or_error(positional_index_path),
                  work_expr.get_descendant_or_error(size_path))
      })
  # Removing the intermediate anonymous nodes.
  result = expression_add.add_to(t, {new_path: work_expr})
  return result, new_path

get_positional_index ¶

get_positional_index(
    expr: Expression,
    source_path: Path,
    new_field_name: Step,
) -> Tuple[Expression, Path]

Gets the positional index.

Given a field with parent_index [0,1,1,2,3,4,4], this returns: parent_index [0,1,1,2,3,4,4] and value [0,0,1,0,0,0,1]

PARAMETER	DESCRIPTION
`expr`	original expression TYPE: `Expression`
`source_path`	path in expression to get index of. TYPE: `Path`
`new_field_name`	the name of the new field. TYPE: `Step`

RETURNS	DESCRIPTION
`Tuple[Expression, Path]`	The new expression and the new path as a pair.

Source code in struct2tensor/expression_impl/index.py

def get_positional_index(expr: expression.Expression, source_path: path.Path,
                         new_field_name: path.Step
                        ) -> Tuple[expression.Expression, path.Path]:
  """Gets the positional index.

  Given a field with parent_index [0,1,1,2,3,4,4], this returns:
  parent_index [0,1,1,2,3,4,4] and value [0,0,1,0,0,0,1]

  Args:
    expr: original expression
    source_path: path in expression to get index of.
    new_field_name: the name of the new field.

  Returns:
    The new expression and the new path as a pair.
  """
  new_path = source_path.get_parent().get_child(new_field_name)
  return expression_add.add_paths(
      expr, {
          new_path:
              _PositionalIndexExpression(
                  expr.get_descendant_or_error(source_path))
      }), new_path

Modules¶

map_prensor ¶

Arbitrary operations from sparse and ragged tensors to a leaf field.

There are two public methods of note right now: map_sparse_tensor and map_ragged_tensor.

Assume expr is:

session: {
  event: {
    val_a: 10
    val_b: 1
  }
  event: {
    val_a: 20
    val_b: 2
  }
  event: {
  }
  event: {
    val_a: 40
  }
  event: {
    val_b: 5
  }
}

Either of the following alternatives will add val_a and val_b to create val_sum.

map_sparse_tensor converts val_a and val_b to sparse tensors, and then add them to produce val_sum.

new_root = map_prensor.map_sparse_tensor(
    expr,
    path.Path(["event"]),
    [path.Path(["val_a"]), path.Path(["val_b"])],
    lambda x,y: x + y,
    False,
    tf.int32,
    "val_sum")

map_ragged_tensor converts val_a and val_b to ragged tensors, and then add them to produce val_sum.

new_root = map_prensor.map_ragged_tensor(
    expr,
    path.Path(["event"]),
    [path.Path(["val_a"]), path.Path(["val_b"])],
    lambda x,y: x + y,
    False,
    tf.int32,
    "val_sum")

The result of either is:

session: {
  event: {
    val_a: 10
    val_b: 1
    val_sum: 11
  }
  event: {
    val_a: 20
    val_b: 2
    val_sum: 22
  }
  event: {
  }
  event: {
    val_a: 40
    val_sum: 40
  }
  event: {
    val_b: 5
    val_sum: 5
  }
}

FUNCTION	DESCRIPTION
`map_ragged_tensor`	Map a ragged tensor.
`map_sparse_tensor`	Maps a sparse tensor.

Functions¶

map_ragged_tensor ¶

map_ragged_tensor(
    root: Expression,
    root_path: Path,
    paths: Sequence[Path],
    operation: Callable[..., RaggedTensor],
    is_repeated: bool,
    dtype: DType,
    new_field_name: Step,
) -> Expression

Map a ragged tensor.

PARAMETER	DESCRIPTION
`root`	the root of the expression. TYPE: `Expression`
`root_path`	the path relative to which the ragged tensors are calculated. TYPE: `Path`
`paths`	the input paths relative to the root_path TYPE: `Sequence[Path]`
`operation`	a method that takes the list of ragged tensors as input and returns a ragged tensor. TYPE: `Callable[..., RaggedTensor]`
`is_repeated`	true if the result of operation is repeated. TYPE: `bool`
`dtype`	dtype of the result of the operation. TYPE: `DType`
`new_field_name`	root_path.get_child(new_field_name) is the path of the result. TYPE: `Step`

RETURNS	DESCRIPTION
`Expression`	A new root expression containing the old root expression plus the new path, root_path.get_child(new_field_name), with the result of the operation.

Source code in struct2tensor/expression_impl/map_prensor.py

def map_ragged_tensor(root: expression.Expression, root_path: path.Path,
                      paths: Sequence[path.Path],
                      operation: Callable[..., tf.RaggedTensor],
                      is_repeated: bool, dtype: tf.DType,
                      new_field_name: path.Step) -> expression.Expression:
  """Map a ragged tensor.

  Args:
    root: the root of the expression.
    root_path: the path relative to which the ragged tensors are calculated.
    paths: the input paths relative to the root_path
    operation: a method that takes the list of ragged tensors as input and
      returns a ragged tensor.
    is_repeated: true if the result of operation is repeated.
    dtype: dtype of the result of the operation.
    new_field_name: root_path.get_child(new_field_name) is the path of the
      result.

  Returns:
    A new root expression containing the old root expression plus the new path,
      root_path.get_child(new_field_name), with the result of the operation.
  """
  return _map_ragged_tensor_impl(root, root_path, paths, operation, is_repeated,
                                 dtype, new_field_name)[0]

map_sparse_tensor ¶

map_sparse_tensor(
    root: Expression,
    root_path: Path,
    paths: Sequence[Path],
    operation: Callable[..., SparseTensor],
    is_repeated: bool,
    dtype: DType,
    new_field_name: Step,
) -> Expression

Maps a sparse tensor.

PARAMETER	DESCRIPTION
`root`	the root of the expression. TYPE: `Expression`
`root_path`	the path relative to which the sparse tensors are calculated. TYPE: `Path`
`paths`	the input paths relative to the root_path TYPE: `Sequence[Path]`
`operation`	a method that takes the list of sparse tensors as input and returns a sparse tensor. TYPE: `Callable[..., SparseTensor]`
`is_repeated`	true if the result of operation is repeated. TYPE: `bool`
`dtype`	dtype of the result of the operation. TYPE: `DType`
`new_field_name`	root_path.get_child(new_field_name) is the path of the result. TYPE: `Step`

RETURNS	DESCRIPTION
`Expression`	A new root expression containing the old root expression plus the new path, root_path.get_child(new_field_name), with the result of the operation.

Source code in struct2tensor/expression_impl/map_prensor.py

def map_sparse_tensor(root: expression.Expression, root_path: path.Path,
                      paths: Sequence[path.Path],
                      operation: Callable[..., tf.SparseTensor],
                      is_repeated: bool, dtype: tf.DType,
                      new_field_name: path.Step) -> expression.Expression:
  """Maps a sparse tensor.

  Args:
    root: the root of the expression.
    root_path: the path relative to which the sparse tensors are calculated.
    paths: the input paths relative to the root_path
    operation: a method that takes the list of sparse tensors as input and
      returns a sparse tensor.
    is_repeated: true if the result of operation is repeated.
    dtype: dtype of the result of the operation.
    new_field_name: root_path.get_child(new_field_name) is the path of the
      result.

  Returns:
    A new root expression containing the old root expression plus the new path,
      root_path.get_child(new_field_name), with the result of the operation.
  """

  return _map_sparse_tensor_impl(root, root_path, paths, operation, is_repeated,
                                 dtype, new_field_name)[0]

Modules¶

map_prensor_to_prensor ¶

Arbitrary operations from prensors to prensors in an expression.

This is useful if a single op generates an entire structure. In general, it is better to use the existing expressions framework or design a custom expression than use this op. So long as any of the output is required, all of the input is required.

For example, suppose you have an op my_op, that takes a prensor of the form:

  event
   / \
 foo   bar

and produces a prensor of the form my_result_schema:

   event
    / \
 foo2 bar2

my_result_schema = create_schema(
    is_repeated=True,
    children={"foo2":{is_repeated:True, dtype:tf.int64},
              "bar2":{is_repeated:False, dtype:tf.int64}})

If you give it an expression original with the schema:

 session
    |
  event
  /  \
foo   bar

result = map_prensor_to_prensor(
  original,
  path.Path(["session","event"]),
  my_op,
  my_result_schema)

Result will have the schema:

 session
    |
  event--------
  /  \    \    \
foo   bar foo2 bar2

CLASS	DESCRIPTION
`Schema`	A finite schema for a prensor.

FUNCTION	DESCRIPTION
`create_schema`	Create a schema recursively.
`map_prensor_to_prensor`	Maps an expression to a prensor, and merges that prensor.

Classes¶

Schema ¶

Schema(
    is_repeated: bool = True,
    dtype: Optional[DType] = None,
    schema_feature: Optional[Feature] = None,
    children: Optional[Dict[Step, Schema]] = None,
)

Bases: object

A finite schema for a prensor.

Effectively, this stores everything for the prensor but the tensors themselves.

Notice that this is slightly different than schema_pb2.Schema, although similar in nature. At present, there is no clear way to extract is_repeated and dtype from schema_pb2.Schema.

See create_schema below for constructing a schema.

Note that for LeafNodeTensor, dtype is not None. Also, for ChildNodeTensor and RootNodeTensor, dtype is None. However, a ChildNodeTensor or RootNodeTensor could be childless.

Create a new Schema object.

PARAMETER	DESCRIPTION
`is_repeated`	is the root repeated? TYPE: `bool` DEFAULT: `True`
`dtype`	tf.dtype of the root if the root is a leaf, otherwise None. TYPE: `Optional[DType]` DEFAULT: `None`
`schema_feature`	schema_pb2.Feature of the root (no struct_domain necessary) TYPE: `Optional[Feature]` DEFAULT: `None`
`children`	child schemas. TYPE: `Optional[Dict[Step, Schema]]` DEFAULT: `None`

METHOD	DESCRIPTION
`get_child`
`known_field_names`

ATTRIBUTE	DESCRIPTION
`is_repeated`	TYPE: `bool`
`schema_feature`	TYPE: `Optional[Feature]`
`type`	TYPE: `Optional[DType]`

Source code in struct2tensor/expression_impl/map_prensor_to_prensor.py

def __init__(self,
             is_repeated: bool = True,
             dtype: Optional[tf.DType] = None,
             schema_feature: Optional[schema_pb2.Feature] = None,
             children: Optional[Dict[path.Step, "Schema"]] = None):
  """Create a new Schema object.

  Args:
    is_repeated: is the root repeated?
    dtype: tf.dtype of the root if the root is a leaf, otherwise None.
    schema_feature: schema_pb2.Feature of the root (no struct_domain
      necessary)
    children: child schemas.
  """
  self._is_repeated = is_repeated
  self._type = dtype
  self._schema_feature = schema_feature
  self._children = children if children is not None else {}
  # Cannot have a type and children.
  assert (self._type is None or not self._children)

Attributes¶

is_repeated property ¶

is_repeated: bool

schema_feature property ¶

schema_feature: Optional[Feature]

type property ¶

type: Optional[DType]

Functions¶

get_child ¶

get_child(key: Step)

Source code in struct2tensor/expression_impl/map_prensor_to_prensor.py

def get_child(self, key: path.Step):
  return self._children[key]

known_field_names ¶

known_field_names() -> FrozenSet[Step]

Source code in struct2tensor/expression_impl/map_prensor_to_prensor.py

def known_field_names(self) -> FrozenSet[path.Step]:
  return frozenset(self._children.keys())

Functions¶

create_schema ¶

create_schema(
    is_repeated: bool = True,
    dtype: Optional[DType] = None,
    schema_feature: Optional[Feature] = None,
    children: Optional[Dict[Step, Any]] = None,
) -> Schema

Create a schema recursively.

Example

my_result_schema = create_schema(
  is_repeated=True,
  children={"foo2":{is_repeated=True, dtype=tf.int64},
            "bar2":{is_repeated=False, dtype=tf.int64}})

PARAMETER	DESCRIPTION
`is_repeated`	whether the root is repeated. TYPE: `bool` DEFAULT: `True`
`dtype`	the dtype of a leaf (None for non-leaves). TYPE: `Optional[DType]` DEFAULT: `None`
`schema_feature`	the schema_pb2.Feature describing this expression. name and struct_domain need not be specified. TYPE: `Optional[Feature]` DEFAULT: `None`
`children`	the child schemas. Note that the value type of children is either a Schema or a dictionary of arguments to create_schema. TYPE: `Optional[Dict[Step, Any]]` DEFAULT: `None`

RETURNS	DESCRIPTION
`Schema`	a new Schema represented by the inputs.

Source code in struct2tensor/expression_impl/map_prensor_to_prensor.py

def create_schema(is_repeated: bool = True,
                  dtype: Optional[tf.DType] = None,
                  schema_feature: Optional[schema_pb2.Feature] = None,
                  children: Optional[Dict[path.Step, Any]] = None) -> Schema:
  """Create a schema recursively.

  !!! Example
      ```python
      my_result_schema = create_schema(
        is_repeated=True,
        children={"foo2":{is_repeated=True, dtype=tf.int64},
                  "bar2":{is_repeated=False, dtype=tf.int64}})
      ```

  Args:
    is_repeated: whether the root is repeated.
    dtype: the dtype of a leaf (None for non-leaves).
    schema_feature: the schema_pb2.Feature describing this expression. name and
      struct_domain need not be specified.
    children: the child schemas. Note that the value type of children is either
      a Schema or a dictionary of arguments to create_schema.

  Returns:
    a new Schema represented by the inputs.
  """
  children_dict = children or {}
  child_schemas = {
      k: _create_schema_helper(v) for k, v in children_dict.items()
  }
  return Schema(
      is_repeated=is_repeated,
      dtype=dtype,
      schema_feature=schema_feature,
      children=child_schemas)

map_prensor_to_prensor ¶

map_prensor_to_prensor(
    root_expr: Expression,
    source: Path,
    paths_needed: Sequence[Path],
    prensor_op: Callable[[Prensor], Prensor],
    output_schema: Schema,
) -> Expression

Maps an expression to a prensor, and merges that prensor.

For example, suppose you have an op my_op, that takes a prensor of the form:

  event
  /  \
foo  bar

and produces a prensor of the form my_result_schema:

  event
  /  \
foo2 bar2

If you give it an expression original with the schema:

 session
    |
  event
  /  \
foo   bar

result = map_prensor_to_prensor(
  original,
  path.Path(["session","event"]),
  my_op,
  my_output_schema)

Result will have the schema:

 session
    |
  event--------
  /  \    \    \
foo   bar foo2 bar2

PARAMETER	DESCRIPTION
`root_expr`	the root expression TYPE: `Expression`
`source`	the path where the prensor op is applied. TYPE: `Path`
`paths_needed`	the paths needed for the op. TYPE: `Sequence[Path]`
`prensor_op`	the prensor op TYPE: `Callable[[Prensor], Prensor]`
`output_schema`	the output schema of the op. TYPE: `Schema`

RETURNS	DESCRIPTION
`Expression`	A new expression where the prensor is merged.

Source code in struct2tensor/expression_impl/map_prensor_to_prensor.py

def map_prensor_to_prensor(
    root_expr: expression.Expression, source: path.Path,
    paths_needed: Sequence[path.Path],
    prensor_op: Callable[[prensor.Prensor], prensor.Prensor],
    output_schema: Schema) -> expression.Expression:
  r"""Maps an expression to a prensor, and merges that prensor.

  For example, suppose you have an op my_op, that takes a prensor of the form:

  ```
    event
    /  \
  foo  bar
  ```

  and produces a prensor of the form my_result_schema:

  ```
    event
    /  \
  foo2 bar2
  ```

  If you give it an expression original with the schema:

  ```
   session
      |
    event
    /  \
  foo   bar
  ```
  ```python
  result = map_prensor_to_prensor(
    original,
    path.Path(["session","event"]),
    my_op,
    my_output_schema)
  ```

  Result will have the schema:

  ```
   session
      |
    event--------
    /  \    \    \
  foo   bar foo2 bar2
  ```

  Args:
    root_expr: the root expression
    source: the path where the prensor op is applied.
    paths_needed: the paths needed for the op.
    prensor_op: the prensor op
    output_schema: the output schema of the op.

  Returns:
    A new expression where the prensor is merged.
  """
  original_child = root_expr.get_descendant_or_error(source).project(
      paths_needed)
  prensor_child = _PrensorOpExpression(original_child, prensor_op,
                                       output_schema)
  paths_map = {
      source.get_child(k): prensor_child.get_child_or_error(k)
      for k in prensor_child.known_field_names()
  }
  result = expression_add.add_paths(root_expr, paths_map)
  return result

Modules¶

map_values ¶

Maps the values of various leaves of the same child to a single result.

All inputs must have the same shape (parent_index must be equal).

The output is given the same shape (output of function must be of equal length).

Note that the operations are on 1-D tensors (as opposed to scalars).

FUNCTION	DESCRIPTION
`map_many_values`	Map multiple sibling fields into a new sibling.
`map_values`	Map field into a new sibling.
`map_values_anonymous`	Map field into a new sibling.

Functions¶

map_many_values ¶

map_many_values(
    root: Expression,
    parent_path: Path,
    source_fields: Sequence[Step],
    operation: Callable[..., Tensor],
    dtype: DType,
    new_field_name: Step,
) -> Tuple[Expression, Path]

Map multiple sibling fields into a new sibling.

All source fields must have the same shape, and the shape of the output must be the same as well.

PARAMETER	DESCRIPTION
`root`	original root. TYPE: `Expression`
`parent_path`	parent path of all sources and the new field. TYPE: `Path`
`source_fields`	source fields of the operation. Must have the same shape. TYPE: `Sequence[Step]`
`operation`	operation from source_fields to new field. TYPE: `Callable[..., Tensor]`
`dtype`	type of new field. TYPE: `DType`
`new_field_name`	name of the new field. TYPE: `Step`

RETURNS	DESCRIPTION
`Tuple[Expression, Path]`	The new expression and the new path as a pair.

Source code in struct2tensor/expression_impl/map_values.py

def map_many_values(
    root: expression.Expression, parent_path: path.Path,
    source_fields: Sequence[path.Step], operation: Callable[..., tf.Tensor],
    dtype: tf.DType,
    new_field_name: path.Step) -> Tuple[expression.Expression, path.Path]:
  """Map multiple sibling fields into a new sibling.

  All source fields must have the same shape, and the shape of the output
  must be the same as well.

  Args:
    root: original root.
    parent_path: parent path of all sources and the new field.
    source_fields: source fields of the operation. Must have the same shape.
    operation: operation from source_fields to new field.
    dtype: type of new field.
    new_field_name: name of the new field.

  Returns:
    The new expression and the new path as a pair.
  """
  new_path = parent_path.get_child(new_field_name)
  return expression_add.add_paths(
      root, {
          new_path:
              _MapValuesExpression([
                  root.get_descendant_or_error(parent_path.get_child(f))
                  for f in source_fields
              ], operation, dtype)
      }), new_path

map_values ¶

map_values(
    root: Expression,
    source_path: Path,
    operation: Callable[[Tensor], Tensor],
    dtype: DType,
    new_field_name: Step,
) -> Expression

Map field into a new sibling.

The shape of the output must be the same as the input.

PARAMETER	DESCRIPTION
`root`	original root. TYPE: `Expression`
`source_path`	source of the operation. TYPE: `Path`
`operation`	operation from source_fields to new field. TYPE: `Callable[[Tensor], Tensor]`
`dtype`	type of new field. TYPE: `DType`
`new_field_name`	name of the new field. TYPE: `Step`

RETURNS	DESCRIPTION
`Expression`	The new expression.

Source code in struct2tensor/expression_impl/map_values.py

def map_values(root: expression.Expression, source_path: path.Path,
               operation: Callable[[tf.Tensor], tf.Tensor], dtype: tf.DType,
               new_field_name: path.Step) -> expression.Expression:
  """Map field into a new sibling.

  The shape of the output must be the same as the input.

  Args:
    root: original root.
    source_path: source of the operation.
    operation: operation from source_fields to new field.
    dtype: type of new field.
    new_field_name: name of the new field.

  Returns:
    The new expression.
  """
  if not source_path:
    raise ValueError('Cannot map the root.')
  return map_many_values(root, source_path.get_parent(),
                         [source_path.field_list[-1]], operation, dtype,
                         new_field_name)[0]

map_values_anonymous ¶

map_values_anonymous(
    root: Expression,
    source_path: Path,
    operation: Callable[[Tensor], Tensor],
    dtype: DType,
) -> Tuple[Expression, Path]

Map field into a new sibling.

The shape of the output must be the same as the input.

PARAMETER	DESCRIPTION
`root`	original root. TYPE: `Expression`
`source_path`	source of the operation. TYPE: `Path`
`operation`	operation from source_fields to new field. TYPE: `Callable[[Tensor], Tensor]`
`dtype`	type of new field. TYPE: `DType`

RETURNS	DESCRIPTION
`Tuple[Expression, Path]`	The new expression and the new path as a pair.

Source code in struct2tensor/expression_impl/map_values.py

def map_values_anonymous(
    root: expression.Expression, source_path: path.Path,
    operation: Callable[[tf.Tensor], tf.Tensor],
    dtype: tf.DType) -> Tuple[expression.Expression, path.Path]:
  """Map field into a new sibling.

  The shape of the output must be the same as the input.

  Args:
    root: original root.
    source_path: source of the operation.
    operation: operation from source_fields to new field.
    dtype: type of new field.

  Returns:
    The new expression and the new path as a pair.
  """
  if not source_path:
    raise ValueError('Cannot map the root.')
  return map_many_values(root, source_path.get_parent(),
                         [source_path.field_list[-1]], operation, dtype,
                         path.get_anonymous_field())

Modules¶

parquet ¶

Apache Parquet Dataset.

Example Usage

exp = create_expression_from_parquet_file(filenames)
docid_project_exp = project.project(exp, [path.Path(["DocId"])])
pqds = parquet_dataset.calculate_parquet_values([docid_project_exp], exp,
                                                filenames, batch_size)

for prensors in pqds:
  doc_id_prensor = prensors[0]

CLASS	DESCRIPTION
`ParquetDataset`	A dataset which reads columns from a parquet file and returns a prensor.

FUNCTION	DESCRIPTION
`calculate_parquet_values`	Calculates expressions and returns a parquet dataset.
`create_expression_from_parquet_file`	Creates a placeholder expression from a parquet file.

Classes¶

ParquetDataset ¶

ParquetDataset(
    filenames: List[str],
    value_paths: List[str],
    batch_size: int,
)

Bases: _RawParquetDataset

A dataset which reads columns from a parquet file and returns a prensor.

The prensor will have a PrensorTypeSpec, which is created based on value_paths.

Note

In tensorflow v1 this dataset will not return a prensor. The output will be the same format as _RawParquetDataset's output (a vector of tensors). The following is a workaround in v1:

pq_ds = ParquetDataset(...)
type_spec = pq_ds.element_spec
tensors = pq_ds.make_one_shot_iterator().get_next()
prensor = type_spec.from_components(tensors)
session.run(prensor)

Creates a ParquetDataset.

PARAMETER	DESCRIPTION
`filenames`	A list containing the name(s) of the file(s) to be read. TYPE: `List[str]`
`value_paths`	A list of strings of the dotstring path(s) of each leaf path(s). TYPE: `List[str]`
`batch_size`	An int that determines how many messages are parsed into one prensor tree in an iteration. If there are fewer than batch_size remaining messages, then all remaining messages will be returned. TYPE: `int`

RAISES	DESCRIPTION
`ValueError`	if the column does not exist in the parquet schema.

ATTRIBUTE	DESCRIPTION
`element_spec`
`element_structure`
`output_classes`
`output_shapes`
`output_types`

Source code in struct2tensor/expression_impl/parquet.py

def __init__(self, filenames: List[str], value_paths: List[str],
             batch_size: int):
  """Creates a ParquetDataset.

  Args:
    filenames: A list containing the name(s) of the file(s) to be read.
    value_paths: A list of strings of the dotstring path(s) of each leaf
      path(s).
    batch_size: An int that determines how many messages are parsed into one
      prensor tree in an iteration. If there are fewer than batch_size
      remaining messages, then all remaining messages will be returned.

  Raises:
    ValueError: if the column does not exist in the parquet schema.
  """
  self._filenames = filenames
  self._value_paths = value_paths
  self._batch_size = batch_size

  for filename in filenames:
    self._validate_file(filename, value_paths)

  self._value_dtypes = self._get_column_dtypes(filenames[0], value_paths)

  self._parent_index_paths = []
  self._path_index = []

  self.element_structure = self._create_prensor_spec()
  self._create_parent_index_paths_and_index_from_type_spec(
      self.element_structure, 0, 0)

  super(ParquetDataset,
        self).__init__(filenames, self._value_paths, self._value_dtypes,
                       self._parent_index_paths, self._path_index, batch_size)

Attributes¶

element_spec property ¶

element_spec

element_structure instance-attribute ¶

element_structure = _create_prensor_spec()

output_classes property ¶

output_classes

output_shapes property ¶

output_shapes

output_types property ¶

output_types

Functions¶

calculate_parquet_values ¶

calculate_parquet_values(
    expressions: List[Expression],
    root_exp: _PlaceholderRootExpression,
    filenames: List[str],
    batch_size: int,
    options: Optional[Options] = None,
)

Calculates expressions and returns a parquet dataset.

PARAMETER	DESCRIPTION
`expressions`	A list of expressions to calculate. TYPE: `List[Expression]`
`root_exp`	The root placeholder expression to use as the feed dict. TYPE: `_PlaceholderRootExpression`
`filenames`	A list of parquet files. TYPE: `List[str]`
`batch_size`	The number of messages to batch. TYPE: `int`
`options`	calculate options. TYPE: `Optional[Options]` DEFAULT: `None`

RETURNS	DESCRIPTION
	A parquet dataset.

Source code in struct2tensor/expression_impl/parquet.py

def calculate_parquet_values(
    expressions: List[expression.Expression],
    root_exp: placeholder._PlaceholderRootExpression,  # pylint: disable=protected-access
    filenames: List[str],
    batch_size: int,
    options: Optional[calculate_options.Options] = None):
  """Calculates expressions and returns a parquet dataset.

  Args:
    expressions: A list of expressions to calculate.
    root_exp: The root placeholder expression to use as the feed dict.
    filenames: A list of parquet files.
    batch_size: The number of messages to batch.
    options: calculate options.

  Returns:
    A parquet dataset.
  """
  pqds = _ParquetDatasetWithExpression(expressions, root_exp, filenames,
                                       batch_size, options)
  return pqds.map(pqds._calculate_prensor)  # pylint: disable=protected-access

create_expression_from_parquet_file ¶

create_expression_from_parquet_file(
    filenames: List[str],
) -> _PlaceholderRootExpression

Creates a placeholder expression from a parquet file.

PARAMETER	DESCRIPTION
`filenames`	A list of parquet files. TYPE: `List[str]`

RETURNS	DESCRIPTION
`_PlaceholderRootExpression`	A PlaceholderRootExpression that should be used as the root of an expression graph.

Source code in struct2tensor/expression_impl/parquet.py

def create_expression_from_parquet_file(
    filenames: List[str]) -> placeholder._PlaceholderRootExpression:  # pylint: disable=protected-access
  """Creates a placeholder expression from a parquet file.

  Args:
    filenames: A list of parquet files.

  Returns:
    A PlaceholderRootExpression that should be used as the root of an expression
      graph.
  """

  metadata = pq.ParquetFile(filenames[0]).metadata
  parquet_schema = metadata.schema
  arrow_schema = parquet_schema.to_arrow_schema()

  root_schema = mpp.create_schema(
      is_repeated=True,
      children=_create_children_from_arrow_fields(
          [arrow_schema.field_by_name(name) for name in arrow_schema.names]))

  # pylint: disable=protected-access
  return placeholder._PlaceholderRootExpression(root_schema)

Modules¶

parse_message_level_ex ¶

Parses regular fields, extensions, any casts, and map protos.

This is intended for use within proto.py, not independently.

parse_message_level(...) in struct2tensor_ops provides a direct interface to parsing a protocol buffer message. In particular, extensions and regular fields can be directly extracted from the protobuf. However, prensors provide other syntactic sugar to parse protobufs, and parse_message_level_ex(...) handles these in addition to regular fields and extensions.

Specifically, consider google.protobuf.Any and proto maps:

package foo.bar;

message MyMessage {
  Any my_any = 1;
  map<string, Baz> my_map = 2;
}
message Baz {
  int32 my_int = 1;
  ...
}

Then for MyMessage, the path my_any.(foo.bar.Baz).my_int is an optional path. Also, my_map[x].my_int is an optional path.

  MyMessage--------------
     \  my_any?          \ my_map[x]
      *                   *
       \  (foo.bar.Baz)?   \  my_int?
        *                   *
         \  my_int?
          *

Thus, we can run:

my_message_serialized_tensor = ...

my_message_parsed = parse_message_level_ex(
    my_message_serialized_tensor,
    MyMessage.DESCRIPTOR,
    {"my_any", "my_map[x]"})

my_any_serialized = my_message_parsed["my_any"].value

my_any_parsed = parse_message_level_ex(
    my_any_serialized,
    Any.DESCRIPTOR,
    {"(foo.bar.Baz)"})

At this point, my_message_parsed["my_map[x]"].value AND my_any_parsed["(foo.bar.Baz)"].value are serialized Baz tensors.

FUNCTION	DESCRIPTION
`get_full_name_from_any_step`	Gets the full name of a protobuf from a google.protobuf.Any step.
`is_any_descriptor`	Returns true if it is an Any descriptor.
`parse_message_level_ex`	Parses regular fields, extensions, any casts, and map protos.

ATTRIBUTE	DESCRIPTION
`ProtoFieldName`
`ProtoFullName`
`StrStep`

Attributes¶

ProtoFieldName `module-attribute` ¶

ProtoFieldName = str

ProtoFullName `module-attribute` ¶

ProtoFullName = str

StrStep `module-attribute` ¶

StrStep = str

Functions¶

get_full_name_from_any_step ¶

get_full_name_from_any_step(
    step: ProtoFieldName,
) -> Optional[ProtoFieldName]

Gets the full name of a protobuf from a google.protobuf.Any step.

An any step is of the form (foo.com/bar.Baz). In this case the result would be bar.Baz.

PARAMETER	DESCRIPTION
`step`	the string of a step in a path. TYPE: `ProtoFieldName`

RETURNS	DESCRIPTION
`Optional[ProtoFieldName]`	the full name of a protobuf if the step is an any step, or None otherwise.

Source code in struct2tensor/expression_impl/parse_message_level_ex.py

def get_full_name_from_any_step(
    step: ProtoFieldName) -> Optional[ProtoFieldName]:
  """Gets the full name of a protobuf from a google.protobuf.Any step.

  An any step is of the form (foo.com/bar.Baz). In this case the result would
  be bar.Baz.

  Args:
    step: the string of a step in a path.

  Returns:
    the full name of a protobuf if the step is an any step, or None otherwise.
  """
  if not step:
    return None
  if step[0] != "(":
    return None
  if step[-1] != ")":
    return None
  step_without_parens = step[1:-1]
  return step_without_parens.split("/")[-1]

is_any_descriptor ¶

is_any_descriptor(desc: Descriptor) -> bool

Returns true if it is an Any descriptor.

Source code in struct2tensor/expression_impl/parse_message_level_ex.py

def is_any_descriptor(desc: descriptor.Descriptor) -> bool:
  """Returns true if it is an Any descriptor."""
  return desc.full_name == "google.protobuf.Any"

parse_message_level_ex ¶

parse_message_level_ex(
    tensor_of_protos: Tensor,
    desc: Descriptor,
    field_names: Set[ProtoFieldName],
    message_format: str = "binary",
    backing_str_tensor: Optional[Tensor] = None,
    honor_proto3_optional_semantics: bool = False,
) -> Mapping[StrStep, _ParsedField]

Parses regular fields, extensions, any casts, and map protos.

Source code in struct2tensor/expression_impl/parse_message_level_ex.py

def parse_message_level_ex(
    tensor_of_protos: tf.Tensor,
    desc: descriptor.Descriptor,
    field_names: Set[ProtoFieldName],
    message_format: str = "binary",
    backing_str_tensor: Optional[tf.Tensor] = None,
    honor_proto3_optional_semantics: bool = False
) -> Mapping[StrStep, struct2tensor_ops._ParsedField]:
  """Parses regular fields, extensions, any casts, and map protos."""
  raw_field_names = _get_field_names_to_parse(desc, field_names)
  regular_fields = list(
      struct2tensor_ops.parse_message_level(
          tensor_of_protos,
          desc,
          raw_field_names,
          message_format=message_format,
          backing_str_tensor=backing_str_tensor,
          honor_proto3_optional_semantics=honor_proto3_optional_semantics))
  regular_field_map = {x.field_name: x for x in regular_fields}

  any_fields = _get_any_parsed_fields(desc, regular_field_map, field_names)
  map_fields = _get_map_parsed_fields(desc, regular_field_map, field_names,
                                      backing_str_tensor)
  result = regular_field_map
  result.update(any_fields)
  result.update(map_fields)
  return result

Modules¶

placeholder ¶

Placeholder expression.

A placeholder expression represents prensor nodes, however a prensor is not needed until calculate is called. This allows the user to apply expression queries to a placeholder expression before having an actual prensor object. When calculate is called on a placeholder expression (or a descendant of a placeholder expression), the feed_dict will need to be passed in. Then calculate will bind the prensor with the appropriate placeholder expression.

Sample usage:

placeholder_exp = placeholder.create_expression_from_schema(schema)
new_exp = expression_queries(placeholder_exp, ..)
result = calculate.calculate_values([new_exp],
                                    feed_dict={placeholder_exp: pren})
# placeholder_exp requires a feed_dict to be passed in when calculating

FUNCTION	DESCRIPTION
`create_expression_from_schema`	Creates a placeholder expression from a parquet schema.
`get_placeholder_paths_from_graph`	Gets all placeholder paths from an expression graph.

Functions¶

create_expression_from_schema ¶

create_expression_from_schema(
    schema: Schema,
) -> _PlaceholderRootExpression

Creates a placeholder expression from a parquet schema.

PARAMETER	DESCRIPTION
`schema`	The schema that describes the prensor tree that this placeholder represents. TYPE: `Schema`

RETURNS	DESCRIPTION
`_PlaceholderRootExpression`	A PlaceholderRootExpression that should be used as the root of an expression graph.

Source code in struct2tensor/expression_impl/placeholder.py

def create_expression_from_schema(
    schema: mpp.Schema) -> "_PlaceholderRootExpression":
  """Creates a placeholder expression from a parquet schema.

  Args:
    schema: The schema that describes the prensor tree that this placeholder
      represents.

  Returns:
    A PlaceholderRootExpression that should be used as the root of an expression
      graph.
  """

  return _PlaceholderRootExpression(schema)

get_placeholder_paths_from_graph ¶

get_placeholder_paths_from_graph(
    graph: ExpressionGraph,
) -> List[Path]

Gets all placeholder paths from an expression graph.

This finds all leaf placeholder expressions in an expression graph, and gets the path of these expressions.

PARAMETER	DESCRIPTION
`graph`	expression graph TYPE: `ExpressionGraph`

RETURNS	DESCRIPTION
`List[Path]`	a list of paths of placeholder expressions

Source code in struct2tensor/expression_impl/placeholder.py

def get_placeholder_paths_from_graph(
    graph: calculate.ExpressionGraph) -> List[path.Path]:
  """Gets all placeholder paths from an expression graph.

  This finds all leaf placeholder expressions in an expression graph, and gets
  the path of these expressions.

  Args:
    graph: expression graph

  Returns:
    a list of paths of placeholder expressions
  """
  expressions = [
      x for x in graph.get_expressions_needed()
      if (_is_placeholder_expression(x) and x.is_leaf)
  ]
  expressions = typing.cast(List[_PlaceholderExpression], expressions)
  return [e.get_path() for e in expressions]

Modules¶

project ¶

project selects a subtree of an expression.

project is often used right before calculating the value.

Example

expr = ...
new_expr = project.project(expr, [path.Path(["foo","bar"]),
                                  path.Path(["x", "y"])])
[prensor_result] = calculate.calculate_prensors([new_expr])

prensor_result now has two paths, "foo.bar" and "x.y".

FUNCTION	DESCRIPTION
`project`	select a subtree.

Functions¶

project ¶

project(
    expr: Expression, paths: Sequence[Path]
) -> Expression

select a subtree.

Paths not selected are removed. Paths that are selected are "known", such that if calculate_prensors is called, they will be in the result.

PARAMETER	DESCRIPTION
`expr`	the original expression. TYPE: `Expression`
`paths`	the paths to include. TYPE: `Sequence[Path]`

RETURNS	DESCRIPTION
`Expression`	A projected expression.

Source code in struct2tensor/expression_impl/project.py

def project(expr: expression.Expression,
            paths: Sequence[path.Path]) -> expression.Expression:
  """select a subtree.

  Paths not selected are removed.
  Paths that are selected are "known", such that if calculate_prensors is
  called, they will be in the result.

  Args:
    expr: the original expression.
    paths: the paths to include.

  Returns:
    A projected expression.
  """
  missing_paths = [p for p in paths if expr.get_descendant(p) is None]
  if missing_paths:
    raise ValueError("{} Path(s) missing in project: {}".format(
        len(missing_paths), ", ".join([str(x) for x in missing_paths])))
  return _ProjectExpression(expr, paths)

Modules¶

promote ¶

Promote an expression to be a child of its grandparent.

Promote is part of the standard flattening of data, promote_and_broadcast, which takes structured data and flattens it. By directly accessing promote, one can perform simpler operations.

For example, suppose an expr represents:

+
|
+-session*   (stars indicate repeated)
     |
     +-event*
         |
         +-val*-int64

session: {
  event: {
    val: 111
  }
  event: {
    val: 121
    val: 122
  }
}

session: {
  event: {
    val: 10
    val: 7
  }
  event: {
    val: 1
  }
}

promote.promote(expr, path.Path(["session", "event", "val"]), nval)

produces:

+
|
+-session*   (stars indicate repeated)
     |
     +-event*
     |    |
     |    +-val*-int64
     |
     +-nval*-int64

session: {
  event: {
    val: 111
  }
  event: {
    val: 121
    val: 122
  }
  nval: 111
  nval: 121
  nval: 122
}

session: {
  event: {
    val: 10
    val: 7
  }
  event: {
    val: 1
  }
  nval: 10
  nval: 7
  nval: 1
}

CLASS	DESCRIPTION
`PromoteChildExpression`	The root of the promoted sub tree.
`PromoteExpression`	A promoted leaf.

FUNCTION	DESCRIPTION
`promote`	Promote a path to be a child of its grandparent, and give it a name.
`promote_anonymous`	Promote a path to be a new anonymous child of its grandparent.

Classes¶

PromoteChildExpression ¶

PromoteChildExpression(
    origin: Expression, origin_parent: Expression
)

Bases: Expression

The root of the promoted sub tree.

Initialize an expression.

PARAMETER	DESCRIPTION
`is_repeated`	if the expression is repeated. TYPE: `bool`
`my_type`	the DType of a field, or None for an internal node. TYPE: `Optional[DType]`
`schema_feature`	the local schema (StructDomain information should not be present). TYPE: `Optional[Feature]` DEFAULT: `None`
`validate_step_format`	If True, validates that steps do not have any characters that could be ambiguously understood as structure delimiters (e.g. "."). If False, such characters are allowed and the client is responsible to ensure to not rely on any auto-coercion of strings to paths. TYPE: `bool` DEFAULT: `True`

METHOD	DESCRIPTION
`apply`
`apply_schema`
`broadcast`	Broadcasts the existing field at source_path to the sibling_field.
`calculate`	Calculates the node tensor of the expression.
`calculation_equal`	self.calculate is equal to another expression.calculate.
`calculation_is_identity`	True iff the self.calculate is the identity.
`cogroup_by_index`	Creates a cogroup of left_name and right_name at new_field_name.
`create_has_field`	Creates a field that is the presence of the source path.
`create_proto_index`	Creates a proto index field as a direct child of the current root.
`create_size_field`	Creates a field that is the size of the source path.
`get_child`	Gets a named child.
`get_child_or_error`	Gets a named child.
`get_descendant`	Finds the descendant at the path.
`get_descendant_or_error`	Finds the descendant at the path.
`get_known_children`
`get_known_descendants`	Gets a mapping from known paths to subexpressions.
`get_paths_with_schema`	Extract only paths that contain schema information.
`get_schema`	Returns a schema for the entire tree.
`get_source_expressions`	Gets the sources of this expression.
`known_field_names`	Returns known field names of the expression.
`map_field_values`	Map a primitive field to create a new primitive field.
`map_ragged_tensors`	Maps a set of primitive fields of a message to a new field.
`map_sparse_tensors`	Maps a set of primitive fields of a message to a new field.
`project`	Constrains the paths to those listed.
`promote`	Promotes source_path to be a field new_field_name in its grandparent.
`promote_and_broadcast`
`reroot`	Returns a new list of protocol buffers available at new_root.
`schema_string`	Returns a schema for the expression.
`slice`	Creates a slice copy of source_path at new_field_path.
`truncate`	Creates a truncated copy of source_path at new_field_path.

ATTRIBUTE	DESCRIPTION
`is_leaf`	True iff the node tensor is a LeafNodeTensor. TYPE: `bool`
`is_repeated`	True iff the same parent value can have multiple children values. TYPE: `bool`
`schema_feature`	Return the schema of the field. TYPE: `Optional[Feature]`
`type`	dtype of the expression, or None if not a leaf expression. TYPE: `Optional[DType]`
`validate_step_format`	TYPE: `bool`

Source code in struct2tensor/expression_impl/promote.py

def __init__(self, origin: expression.Expression,
             origin_parent: expression.Expression):

  super().__init__(
      origin.is_repeated or origin_parent.is_repeated,
      origin.type,
      schema_feature=_get_promote_schema_feature(
          origin.schema_feature, origin_parent.schema_feature
      ),
      validate_step_format=origin.validate_step_format,
  )
  self._origin = origin
  self._origin_parent = origin_parent
  if self._origin_parent.type is not None:
    raise ValueError("origin_parent cannot be a field")

Attributes¶

is_leaf property ¶

is_leaf: bool

True iff the node tensor is a LeafNodeTensor.

is_repeated property ¶

is_repeated: bool

True iff the same parent value can have multiple children values.

schema_feature property ¶

schema_feature: Optional[Feature]

Return the schema of the field.

type property ¶

type: Optional[DType]

dtype of the expression, or None if not a leaf expression.

validate_step_format property ¶

validate_step_format: bool

Functions¶

apply ¶

apply(
    transform: Callable[[Expression], Expression],
) -> Expression

Source code in struct2tensor/expression.py

def apply(self,
          transform: Callable[["Expression"], "Expression"]) -> "Expression":
  return transform(self)

apply_schema ¶

apply_schema(schema: Schema) -> Expression

Source code in struct2tensor/expression.py

def apply_schema(self, schema: schema_pb2.Schema) -> "Expression":
  return apply_schema.apply_schema(self, schema)

broadcast ¶

broadcast(
    source_path: CoercableToPath,
    sibling_field: Step,
    new_field_name: Step,
) -> Expression

Broadcasts the existing field at source_path to the sibling_field.

Source code in struct2tensor/expression.py

def broadcast(self, source_path: CoercableToPath, sibling_field: path.Step,
              new_field_name: path.Step) -> "Expression":
  """Broadcasts the existing field at source_path to the sibling_field."""
  return broadcast.broadcast(self, path.create_path(source_path),
                             sibling_field, new_field_name)

calculate ¶

calculate(
    sources: Sequence[NodeTensor],
    destinations: Sequence[Expression],
    options: Options,
    side_info: Optional[Prensor] = None,
) -> NodeTensor

Calculates the node tensor of the expression.

The node tensor must be a function of the properties of the expression and the node tensors of the expressions from get_source_expressions().

If is_leaf, then calculate must return a LeafNodeTensor. Otherwise, it must return a ChildNodeTensor or RootNodeTensor.

If calculate_is_identity is true, then this must return source_tensors[0].

Sometimes, for operations such as parsing the proto, calculate will return additional information. For example, calculate() for the root of the proto expression also parses out the tensors required to calculate the tensors of the children. This is why destinations are required.

For a reference use, see calculate_value_slowly(...) below.

PARAMETER	DESCRIPTION
`source_tensors`	The node tensors of the expressions in get_source_expressions(). TYPE: `Sequence[NodeTensor]`
`destinations`	The expressions that will use the output of this method. TYPE: `Sequence[Expression]`
`options`	Options for the calculation. TYPE: `Options`
`side_info`	An optional prensor that is used to bind to a placeholder expression. TYPE: `Optional[Prensor]` DEFAULT: `None`

RETURNS	DESCRIPTION
`NodeTensor`	A NodeTensor representing the output of this expression.

Source code in struct2tensor/expression_impl/promote.py

def calculate(
    self,
    sources: Sequence[prensor.NodeTensor],
    destinations: Sequence[expression.Expression],
    options: calculate_options.Options,
    side_info: Optional[prensor.Prensor] = None) -> prensor.NodeTensor:
  [origin_value, origin_parent_value] = sources
  if not isinstance(origin_value, prensor.ChildNodeTensor):
    raise ValueError("origin_value must be a child")
  if not isinstance(origin_parent_value, prensor.ChildNodeTensor):
    raise ValueError("origin_parent_value must be a child node")
  new_parent_index = tf.gather(origin_parent_value.parent_index,
                               origin_value.parent_index)
  return prensor.ChildNodeTensor(new_parent_index, self.is_repeated)

calculation_equal ¶

calculation_equal(expr: Expression) -> bool

self.calculate is equal to another expression.calculate.

Given the same source node tensors, self.calculate(...) and expression.calculate(...) will have the same result.

Note that this does not check that the source expressions of the two expressions are the same. Therefore, two operations can have the same calculation, but not the same output, because their sources are different. For example, if a.calculation_is_identity() is True and b.calculation_is_identity() is True, then a.calculation_equal(b) is True. However, unless a and b have the same source, the expressions themselves are not equal.

PARAMETER	DESCRIPTION
`expression`	The expression to compare to. TYPE: `Expression`

Source code in struct2tensor/expression_impl/promote.py

def calculation_equal(self, expr: expression.Expression) -> bool:
  return isinstance(expr, PromoteChildExpression)

calculation_is_identity ¶

calculation_is_identity() -> bool

True iff the self.calculate is the identity.

There is exactly one source, and the output of self.calculate(...) is the node tensor of this source.

Source code in struct2tensor/expression_impl/promote.py

def calculation_is_identity(self) -> bool:
  return False

cogroup_by_index ¶

cogroup_by_index(
    source_path: CoercableToPath,
    left_name: Step,
    right_name: Step,
    new_field_name: Step,
) -> Expression

Creates a cogroup of left_name and right_name at new_field_name.

Source code in struct2tensor/expression.py

def cogroup_by_index(self, source_path: CoercableToPath, left_name: path.Step,
                     right_name: path.Step,
                     new_field_name: path.Step) -> "Expression":
  """Creates a cogroup of left_name and right_name at new_field_name."""
  raise NotImplementedError("cogroup_by_index is not implemented")

create_has_field ¶

create_has_field(
    source_path: CoercableToPath, new_field_name: Step
) -> Expression

Creates a field that is the presence of the source path.

Source code in struct2tensor/expression.py

def create_has_field(self, source_path: CoercableToPath,
                     new_field_name: path.Step) -> "Expression":
  """Creates a field that is the presence of the source path."""
  return size.has(self, path.create_path(source_path), new_field_name)

create_proto_index ¶

create_proto_index(field_name: Step) -> Expression

Creates a proto index field as a direct child of the current root.

The proto index maps each root element to the original batch index. For example: [0, 2] means the first element came from the first proto in the original input tensor and the second element came from the third proto. The created field is always "dense" -- it has the same valency as the current root.

PARAMETER	DESCRIPTION
`field_name`	the name of the field to be created. TYPE: `Step`

RETURNS	DESCRIPTION
`Expression`	An Expression object representing the result of the operation.

Source code in struct2tensor/expression.py

def create_proto_index(self, field_name: path.Step) -> "Expression":
  """Creates a proto index field as a direct child of the current root.

  The proto index maps each root element to the original batch index.
  For example: [0, 2] means the first element came from the first proto
  in the original input tensor and the second element came from the third
  proto. The created field is always "dense" -- it has the same valency as
  the current root.

  Args:
    field_name: the name of the field to be created.

  Returns:
    An Expression object representing the result of the operation.
  """

  return reroot.create_proto_index_field(self, field_name)

create_size_field ¶

create_size_field(
    source_path: CoercableToPath, new_field_name: Step
) -> Expression

Creates a field that is the size of the source path.

Source code in struct2tensor/expression.py

def create_size_field(self, source_path: CoercableToPath,
                      new_field_name: path.Step) -> "Expression":
  """Creates a field that is the size of the source path."""
  return size.size(self, path.create_path(source_path), new_field_name)

get_child ¶

get_child(field_name: Step) -> Optional[Expression]

Gets a named child.

Source code in struct2tensor/expression.py

def get_child(self, field_name: path.Step) -> Optional["Expression"]:
  """Gets a named child."""
  if field_name in self._child_cache:
    return self._child_cache[field_name]
  result = self._get_child_impl(field_name)
  self._child_cache[field_name] = result
  return result

get_child_or_error ¶

get_child_or_error(field_name: Step) -> Expression

Gets a named child.

Source code in struct2tensor/expression.py

def get_child_or_error(self, field_name: path.Step) -> "Expression":
  """Gets a named child."""
  result = self.get_child(field_name)
  if result is None:
    raise KeyError("No such field: {}".format(field_name))
  return result

get_descendant ¶

get_descendant(p: Path) -> Optional[Expression]

Finds the descendant at the path.

Source code in struct2tensor/expression.py

def get_descendant(self, p: path.Path) -> Optional["Expression"]:
  """Finds the descendant at the path."""
  result = self
  for field_name in p.field_list:
    result = result.get_child(field_name)
    if result is None:
      return None
  return result

get_descendant_or_error ¶

get_descendant_or_error(p: Path) -> Expression

Finds the descendant at the path.

Source code in struct2tensor/expression.py

def get_descendant_or_error(self, p: path.Path) -> "Expression":
  """Finds the descendant at the path."""
  result = self.get_descendant(p)
  if result is None:
    raise ValueError("Missing path: {} in {}".format(
        str(p), self.schema_string(limit=20)))
  return result

get_known_children ¶

get_known_children() -> Mapping[Step, Expression]

Source code in struct2tensor/expression.py

def get_known_children(self) -> Mapping[path.Step, "Expression"]:
  known_field_names = self.known_field_names()
  result = {}
  for name in known_field_names:
    result[name] = self.get_child_or_error(name)
  return result

get_known_descendants ¶

get_known_descendants() -> Mapping[Path, Expression]

Gets a mapping from known paths to subexpressions.

The difference between this and get_descendants in Prensor is that all paths in a Prensor are realized, thus all known. But an Expression's descendants might not all be known at the point this method is called, because an expression may have an infinite number of children.

RETURNS	DESCRIPTION
`Mapping[Path, Expression]`	A mapping from paths (relative to the root of the subexpression) to expressions.

Source code in struct2tensor/expression.py

def get_known_descendants(self) -> Mapping[path.Path, "Expression"]:
  # Rename get_known_descendants
  """Gets a mapping from known paths to subexpressions.

  The difference between this and get_descendants in Prensor is that
  all paths in a Prensor are realized, thus all known. But an Expression's
  descendants might not all be known at the point this method is called,
  because an expression may have an infinite number of children.

  Returns:
    A mapping from paths (relative to the root of the subexpression) to
      expressions.
  """
  known_subexpressions = {
      k: v.get_known_descendants()
      for k, v in self.get_known_children().items()
  }
  result = {}
  for field_name, subexpression in known_subexpressions.items():
    subexpression_path = path.Path(
        [field_name], validate_step_format=self.validate_step_format
    )
    for p, expr in subexpression.items():
      result[subexpression_path.concat(p)] = expr
  result[path.Path([], validate_step_format=self.validate_step_format)] = self
  return result

get_paths_with_schema ¶

get_paths_with_schema() -> List[Path]

Extract only paths that contain schema information.

Source code in struct2tensor/expression.py

def get_paths_with_schema(self) -> List[path.Path]:
  """Extract only paths that contain schema information."""
  result = []
  for name, child in self.get_known_children().items():
    if child.schema_feature is None:
      continue
    result.extend(
        [
            path.Path(
                [name], validate_step_format=self.validate_step_format
            ).concat(x)
            for x in child.get_paths_with_schema()
        ]
    )
  # Note: We always take the root path and so will return an empty schema
  # if there is no schema information on any nodes, including the root.
  if not result:
    result.append(
        path.Path([], validate_step_format=self.validate_step_format)
    )
  return result

get_schema ¶

get_schema(create_schema_features=True) -> Schema

Returns a schema for the entire tree.

PARAMETER	DESCRIPTION
`create_schema_features`	If True, schema features are added for all children and a schema entry is created if not available on the child. If False, features are left off of the returned schema if there is no schema_feature on the child. DEFAULT: `True`

Source code in struct2tensor/expression.py

def get_schema(self, create_schema_features=True) -> schema_pb2.Schema:
  """Returns a schema for the entire tree.

  Args:
    create_schema_features: If True, schema features are added for all
      children and a schema entry is created if not available on the child. If
      False, features are left off of the returned schema if there is no
      schema_feature on the child.
  """
  if not create_schema_features:
    return self.project(self.get_paths_with_schema()).get_schema()
  result = schema_pb2.Schema()
  self._populate_schema_feature_children(result.feature)
  return result

get_source_expressions ¶

get_source_expressions() -> Sequence[Expression]

Gets the sources of this expression.

The node tensors of the source expressions must be sufficient to calculate the node tensor of this expression (see calculate and calculate_value_slowly).

RETURNS	DESCRIPTION
`Sequence[Expression]`	The sources of this expression.

Source code in struct2tensor/expression_impl/promote.py

def get_source_expressions(self) -> Sequence[expression.Expression]:
  return [self._origin, self._origin_parent]

known_field_names ¶

known_field_names() -> FrozenSet[Step]

Returns known field names of the expression.

TODO(martinz): implement set_field and project. Known field names of a parsed proto correspond to the fields declared in the message. Examples of "unknown" fields are extensions and explicit casts in an any field. The only way to know if an unknown field "(foo.bar)" is present in an expression expr is to call (expr["(foo.bar)"] is not None).

Notice that simply accessing a field does not make it "known". However, setting a field (or setting a descendant of a field) will make it known.

project(...) returns an expression where the known field names are the only field names. In general, if you want to depend upon known_field_names (e.g., if you want to compile a expression), then the best approach is to project() the expression first.

RETURNS	DESCRIPTION
`FrozenSet[Step]`	An immutable set of field names.

Source code in struct2tensor/expression_impl/promote.py

def known_field_names(self) -> FrozenSet[path.Step]:
  return self._origin.known_field_names()

map_field_values ¶

map_field_values(
    source_path: CoercableToPath,
    operator: Callable[[Tensor], Tensor],
    dtype: DType,
    new_field_name: Step,
) -> Expression

Map a primitive field to create a new primitive field.

Note

The dtype argument is added since the v1 API.

PARAMETER	DESCRIPTION
`source_path`	the origin path. TYPE: `CoercableToPath`
`operator`	an element-wise operator that takes a 1-dimensional vector. TYPE: `Callable[[Tensor], Tensor]`
`dtype`	the type of the output. TYPE: `DType`
`new_field_name`	the name of a new sibling of source_path. TYPE: `Step`

RETURNS	DESCRIPTION
`Expression`	the resulting root expression.

Source code in struct2tensor/expression.py

def map_field_values(self, source_path: CoercableToPath,
                     operator: Callable[[tf.Tensor], tf.Tensor],
                     dtype: tf.DType,
                     new_field_name: path.Step) -> "Expression":
  """Map a primitive field to create a new primitive field.

  !!! Note
      The dtype argument is added since the v1 API.

  Args:
    source_path: the origin path.
    operator: an element-wise operator that takes a 1-dimensional vector.
    dtype: the type of the output.
    new_field_name: the name of a new sibling of source_path.

  Returns:
    the resulting root expression.
  """
  return map_values.map_values(self, path.create_path(source_path), operator,
                               dtype, new_field_name)

map_ragged_tensors ¶

map_ragged_tensors(
    parent_path: CoercableToPath,
    source_fields: Sequence[Step],
    operator: Callable[..., SparseTensor],
    is_repeated: bool,
    dtype: DType,
    new_field_name: Step,
) -> Expression

Maps a set of primitive fields of a message to a new field.

Unlike map_field_values, this operation allows you to some degree reshape the field. For instance, you can take two optional fields and create a repeated field, or perform a reduce_sum on the last dimension of a repeated field and create an optional field. The key constraint is that the operator must return a sparse tensor of the correct dimension: i.e., a 2D sparse tensor if is_repeated is true, or a 1D sparse tensor if is_repeated is false. Moreover, the first dimension of the sparse tensor must be equal to the first dimension of the input tensor.

PARAMETER	DESCRIPTION
`parent_path`	the parent of the input and output fields. TYPE: `CoercableToPath`
`source_fields`	the nonempty list of names of the source fields. TYPE: `Sequence[Step]`
`operator`	an operator that takes len(source_fields) sparse tensors and returns a sparse tensor of the appropriate shape. TYPE: `Callable[..., SparseTensor]`
`is_repeated`	whether the output is repeated. TYPE: `bool`
`dtype`	the dtype of the result. TYPE: `DType`
`new_field_name`	the name of the resulting field. TYPE: `Step`

RETURNS	DESCRIPTION
`Expression`	A new query.

Source code in struct2tensor/expression.py

def map_ragged_tensors(self, parent_path: CoercableToPath,
                       source_fields: Sequence[path.Step],
                       operator: Callable[..., tf.SparseTensor],
                       is_repeated: bool, dtype: tf.DType,
                       new_field_name: path.Step) -> "Expression":
  """Maps a set of primitive fields of a message to a new field.

  Unlike map_field_values, this operation allows you to some degree reshape
  the field. For instance, you can take two optional fields and create a
  repeated field, or perform a reduce_sum on the last dimension of a repeated
  field and create an optional field. The key constraint is that the operator
  must return a sparse tensor of the correct dimension: i.e., a
  2D sparse tensor if is_repeated is true, or a 1D sparse tensor if
  is_repeated is false. Moreover, the first dimension of the sparse tensor
  must be equal to the first dimension of the input tensor.

  Args:
    parent_path: the parent of the input and output fields.
    source_fields: the nonempty list of names of the source fields.
    operator: an operator that takes len(source_fields) sparse tensors and
      returns a sparse tensor of the appropriate shape.
    is_repeated: whether the output is repeated.
    dtype: the dtype of the result.
    new_field_name: the name of the resulting field.

  Returns:
    A new query.
  """
  return map_prensor.map_ragged_tensor(
      self,
      path.create_path(parent_path),
      [
          path.Path([f], validate_step_format=self.validate_step_format)
          for f in source_fields
      ],
      operator,
      is_repeated,
      dtype,
      new_field_name,
  )

map_sparse_tensors ¶

map_sparse_tensors(
    parent_path: CoercableToPath,
    source_fields: Sequence[Step],
    operator: Callable[..., SparseTensor],
    is_repeated: bool,
    dtype: DType,
    new_field_name: Step,
) -> Expression

Maps a set of primitive fields of a message to a new field.

Unlike map_field_values, this operation allows you to some degree reshape the field. For instance, you can take two optional fields and create a repeated field, or perform a reduce_sum on the last dimension of a repeated field and create an optional field. The key constraint is that the operator must return a sparse tensor of the correct dimension: i.e., a 2D sparse tensor if is_repeated is true, or a 1D sparse tensor if is_repeated is false. Moreover, the first dimension of the sparse tensor must be equal to the first dimension of the input tensor.

PARAMETER	DESCRIPTION
`parent_path`	the parent of the input and output fields. TYPE: `CoercableToPath`
`source_fields`	the nonempty list of names of the source fields. TYPE: `Sequence[Step]`
`operator`	an operator that takes len(source_fields) sparse tensors and returns a sparse tensor of the appropriate shape. TYPE: `Callable[..., SparseTensor]`
`is_repeated`	whether the output is repeated. TYPE: `bool`
`dtype`	the dtype of the result. TYPE: `DType`
`new_field_name`	the name of the resulting field. TYPE: `Step`

RETURNS	DESCRIPTION
`Expression`	A new query.

Source code in struct2tensor/expression.py

def map_sparse_tensors(self, parent_path: CoercableToPath,
                       source_fields: Sequence[path.Step],
                       operator: Callable[..., tf.SparseTensor],
                       is_repeated: bool, dtype: tf.DType,
                       new_field_name: path.Step) -> "Expression":
  """Maps a set of primitive fields of a message to a new field.

  Unlike map_field_values, this operation allows you to some degree reshape
  the field. For instance, you can take two optional fields and create a
  repeated field, or perform a reduce_sum on the last dimension of a repeated
  field and create an optional field. The key constraint is that the operator
  must return a sparse tensor of the correct dimension: i.e., a
  2D sparse tensor if is_repeated is true, or a 1D sparse tensor if
  is_repeated is false. Moreover, the first dimension of the sparse tensor
  must be equal to the first dimension of the input tensor.

  Args:
    parent_path: the parent of the input and output fields.
    source_fields: the nonempty list of names of the source fields.
    operator: an operator that takes len(source_fields) sparse tensors and
      returns a sparse tensor of the appropriate shape.
    is_repeated: whether the output is repeated.
    dtype: the dtype of the result.
    new_field_name: the name of the resulting field.

  Returns:
    A new query.
  """
  return map_prensor.map_sparse_tensor(
      self,
      path.create_path(parent_path),
      [
          path.Path([f], validate_step_format=self.validate_step_format)
          for f in source_fields
      ],
      operator,
      is_repeated,
      dtype,
      new_field_name,
  )

project ¶

project(path_list: Sequence[CoercableToPath]) -> Expression

Constrains the paths to those listed.

Source code in struct2tensor/expression.py

def project(self, path_list: Sequence[CoercableToPath]) -> "Expression":
  """Constrains the paths to those listed."""
  return project.project(self, [path.create_path(x) for x in path_list])

promote ¶

promote(source_path: CoercableToPath, new_field_name: Step)

Promotes source_path to be a field new_field_name in its grandparent.

Source code in struct2tensor/expression.py

def promote(self, source_path: CoercableToPath, new_field_name: path.Step):
  """Promotes source_path to be a field new_field_name in its grandparent."""
  return promote.promote(self, path.create_path(source_path), new_field_name)

promote_and_broadcast ¶

promote_and_broadcast(
    path_dictionary: Mapping[Step, CoercableToPath],
    dest_path_parent: CoercableToPath,
) -> Expression

Source code in struct2tensor/expression.py

def promote_and_broadcast(
    self, path_dictionary: Mapping[path.Step, CoercableToPath],
    dest_path_parent: CoercableToPath) -> "Expression":
  return promote_and_broadcast.promote_and_broadcast(
      self, {k: path.create_path(v) for k, v in path_dictionary.items()},
      path.create_path(dest_path_parent))

reroot ¶

reroot(new_root: CoercableToPath) -> Expression

Returns a new list of protocol buffers available at new_root.

Source code in struct2tensor/expression.py

def reroot(self, new_root: CoercableToPath) -> "Expression":
  """Returns a new list of protocol buffers available at new_root."""
  return reroot.reroot(self, path.create_path(new_root))

schema_string ¶

schema_string(limit: Optional[int] = None) -> str

Returns a schema for the expression.

For examle,

repeated root:
  optional int32 foo
  optional bar:
    optional string baz
  optional int64 bak

Note that unknown fields and subexpressions are not displayed.

PARAMETER	DESCRIPTION
`limit`	if present, limit the recursion. TYPE: `Optional[int]` DEFAULT: `None`

RETURNS	DESCRIPTION
`str`	A string, describing (a part of) the schema.

Source code in struct2tensor/expression.py

def schema_string(self, limit: Optional[int] = None) -> str:
  """Returns a schema for the expression.

  For examle,
  ```
  repeated root:
    optional int32 foo
    optional bar:
      optional string baz
    optional int64 bak
  ```

  Note that unknown fields and subexpressions are not displayed.

  Args:
    limit: if present, limit the recursion.

  Returns:
    A string, describing (a part of) the schema.
  """
  return "\n".join(self._schema_string_helper("root", limit))

slice ¶

slice(
    source_path: CoercableToPath,
    new_field_name: Step,
    begin: Optional[IndexValue] = None,
    end: Optional[IndexValue] = None,
) -> Expression

Creates a slice copy of source_path at new_field_path.

Note that if begin or end is negative, it is considered relative to the size of the array. e.g., slice(...,begin=-1) will get the last element of every array.

PARAMETER	DESCRIPTION
`source_path`	the source of the slice. TYPE: `CoercableToPath`
`new_field_name`	the new field that is generated. TYPE: `Step`
`begin`	the beginning of the slice (inclusive). TYPE: `Optional[IndexValue]` DEFAULT: `None`
`end`	the end of the slice (exclusive). TYPE: `Optional[IndexValue]` DEFAULT: `None`

RETURNS	DESCRIPTION
`Expression`	An Expression object representing the result of the operation.

Source code in struct2tensor/expression.py

def slice(self,
          source_path: CoercableToPath,
          new_field_name: path.Step,
          begin: Optional[IndexValue] = None,
          end: Optional[IndexValue] = None) -> "Expression":
  """Creates a slice copy of source_path at new_field_path.

  Note that if begin or end is negative, it is considered relative to
  the size of the array. e.g., slice(...,begin=-1) will get the last
  element of every array.

  Args:
    source_path: the source of the slice.
    new_field_name: the new field that is generated.
    begin: the beginning of the slice (inclusive).
    end: the end of the slice (exclusive).

  Returns:
    An Expression object representing the result of the operation.
  """
  return slice_expression.slice_expression(self,
                                           path.create_path(source_path),
                                           new_field_name, begin, end)

truncate ¶

truncate(
    source_path: CoercableToPath,
    limit: Union[int, Tensor],
    new_field_name: Step,
) -> Expression

Creates a truncated copy of source_path at new_field_path.

Source code in struct2tensor/expression.py

def truncate(self, source_path: CoercableToPath, limit: Union[int, tf.Tensor],
             new_field_name: path.Step) -> "Expression":
  """Creates a truncated copy of source_path at new_field_path."""
  return self.slice(source_path, new_field_name, end=limit)

PromoteExpression ¶

PromoteExpression(
    origin: Expression, origin_parent: Expression
)

Bases: Leaf

A promoted leaf.

Initialize a Leaf.

Note that a leaf must have a specified type.

PARAMETER	DESCRIPTION
`is_repeated`	if the expression is repeated. TYPE: `bool`
`my_type`	the DType of the field. TYPE: `DType`
`schema_feature`	schema information about the field. TYPE: `Optional[Feature]` DEFAULT: `None`

METHOD	DESCRIPTION
`apply`
`apply_schema`
`broadcast`	Broadcasts the existing field at source_path to the sibling_field.
`calculate`	Calculates the node tensor of the expression.
`calculation_equal`	self.calculate is equal to another expression.calculate.
`calculation_is_identity`	True iff the self.calculate is the identity.
`cogroup_by_index`	Creates a cogroup of left_name and right_name at new_field_name.
`create_has_field`	Creates a field that is the presence of the source path.
`create_proto_index`	Creates a proto index field as a direct child of the current root.
`create_size_field`	Creates a field that is the size of the source path.
`get_child`	Gets a named child.
`get_child_or_error`	Gets a named child.
`get_descendant`	Finds the descendant at the path.
`get_descendant_or_error`	Finds the descendant at the path.
`get_known_children`
`get_known_descendants`	Gets a mapping from known paths to subexpressions.
`get_paths_with_schema`	Extract only paths that contain schema information.
`get_schema`	Returns a schema for the entire tree.
`get_source_expressions`	Gets the sources of this expression.
`known_field_names`	Returns known field names of the expression.
`map_field_values`	Map a primitive field to create a new primitive field.
`map_ragged_tensors`	Maps a set of primitive fields of a message to a new field.
`map_sparse_tensors`	Maps a set of primitive fields of a message to a new field.
`project`	Constrains the paths to those listed.
`promote`	Promotes source_path to be a field new_field_name in its grandparent.
`promote_and_broadcast`
`reroot`	Returns a new list of protocol buffers available at new_root.
`schema_string`	Returns a schema for the expression.
`slice`	Creates a slice copy of source_path at new_field_path.
`truncate`	Creates a truncated copy of source_path at new_field_path.

ATTRIBUTE	DESCRIPTION
`is_leaf`	True iff the node tensor is a LeafNodeTensor. TYPE: `bool`
`is_repeated`	True iff the same parent value can have multiple children values. TYPE: `bool`
`schema_feature`	Return the schema of the field. TYPE: `Optional[Feature]`
`type`	dtype of the expression, or None if not a leaf expression. TYPE: `Optional[DType]`
`validate_step_format`	TYPE: `bool`

Source code in struct2tensor/expression_impl/promote.py

def __init__(self, origin: expression.Expression,
             origin_parent: expression.Expression):

  super().__init__(
      origin.is_repeated or origin_parent.is_repeated,
      origin.type,
      schema_feature=_get_promote_schema_feature(
          origin.schema_feature, origin_parent.schema_feature))
  self._origin = origin
  self._origin_parent = origin_parent
  if self.type is None:
    raise ValueError("Can only promote a field")
  if self._origin_parent.type is not None:
    raise ValueError("origin_parent cannot be a field")

Attributes¶

is_leaf property ¶

is_leaf: bool

True iff the node tensor is a LeafNodeTensor.

is_repeated property ¶

is_repeated: bool

True iff the same parent value can have multiple children values.

schema_feature property ¶

schema_feature: Optional[Feature]

Return the schema of the field.

type property ¶

type: Optional[DType]

dtype of the expression, or None if not a leaf expression.

validate_step_format property ¶

validate_step_format: bool

Functions¶

apply ¶

apply(
    transform: Callable[[Expression], Expression],
) -> Expression

Source code in struct2tensor/expression.py

def apply(self,
          transform: Callable[["Expression"], "Expression"]) -> "Expression":
  return transform(self)

apply_schema ¶

apply_schema(schema: Schema) -> Expression

Source code in struct2tensor/expression.py

def apply_schema(self, schema: schema_pb2.Schema) -> "Expression":
  return apply_schema.apply_schema(self, schema)

broadcast ¶

broadcast(
    source_path: CoercableToPath,
    sibling_field: Step,
    new_field_name: Step,
) -> Expression

Broadcasts the existing field at source_path to the sibling_field.

Source code in struct2tensor/expression.py

def broadcast(self, source_path: CoercableToPath, sibling_field: path.Step,
              new_field_name: path.Step) -> "Expression":
  """Broadcasts the existing field at source_path to the sibling_field."""
  return broadcast.broadcast(self, path.create_path(source_path),
                             sibling_field, new_field_name)

calculate ¶

calculate(
    sources: Sequence[NodeTensor],
    destinations: Sequence[Expression],
    options: Options,
    side_info: Optional[Prensor] = None,
) -> NodeTensor

Calculates the node tensor of the expression.

The node tensor must be a function of the properties of the expression and the node tensors of the expressions from get_source_expressions().

If is_leaf, then calculate must return a LeafNodeTensor. Otherwise, it must return a ChildNodeTensor or RootNodeTensor.

If calculate_is_identity is true, then this must return source_tensors[0].

Sometimes, for operations such as parsing the proto, calculate will return additional information. For example, calculate() for the root of the proto expression also parses out the tensors required to calculate the tensors of the children. This is why destinations are required.

For a reference use, see calculate_value_slowly(...) below.

PARAMETER	DESCRIPTION
`source_tensors`	The node tensors of the expressions in get_source_expressions(). TYPE: `Sequence[NodeTensor]`
`destinations`	The expressions that will use the output of this method. TYPE: `Sequence[Expression]`
`options`	Options for the calculation. TYPE: `Options`
`side_info`	An optional prensor that is used to bind to a placeholder expression. TYPE: `Optional[Prensor]` DEFAULT: `None`

RETURNS	DESCRIPTION
`NodeTensor`	A NodeTensor representing the output of this expression.

Source code in struct2tensor/expression_impl/promote.py

def calculate(
    self,
    sources: Sequence[prensor.NodeTensor],
    destinations: Sequence[expression.Expression],
    options: calculate_options.Options,
    side_info: Optional[prensor.Prensor] = None) -> prensor.NodeTensor:
  [origin_value, origin_parent_value] = sources
  if not isinstance(origin_value, prensor.LeafNodeTensor):
    raise ValueError("origin_value must be a leaf")
  if not isinstance(origin_parent_value, prensor.ChildNodeTensor):
    raise ValueError("origin_parent_value must be a child node")
  new_parent_index = tf.gather(origin_parent_value.parent_index,
                               origin_value.parent_index)
  return prensor.LeafNodeTensor(new_parent_index, origin_value.values,
                                self.is_repeated)

calculation_equal ¶

calculation_equal(expr: Expression) -> bool

self.calculate is equal to another expression.calculate.

Given the same source node tensors, self.calculate(...) and expression.calculate(...) will have the same result.

Note that this does not check that the source expressions of the two expressions are the same. Therefore, two operations can have the same calculation, but not the same output, because their sources are different. For example, if a.calculation_is_identity() is True and b.calculation_is_identity() is True, then a.calculation_equal(b) is True. However, unless a and b have the same source, the expressions themselves are not equal.

PARAMETER	DESCRIPTION
`expression`	The expression to compare to. TYPE: `Expression`

Source code in struct2tensor/expression_impl/promote.py

def calculation_equal(self, expr: expression.Expression) -> bool:
  return isinstance(expr, PromoteExpression)

calculation_is_identity ¶

calculation_is_identity() -> bool

True iff the self.calculate is the identity.

There is exactly one source, and the output of self.calculate(...) is the node tensor of this source.

Source code in struct2tensor/expression_impl/promote.py

def calculation_is_identity(self) -> bool:
  return False

cogroup_by_index ¶

cogroup_by_index(
    source_path: CoercableToPath,
    left_name: Step,
    right_name: Step,
    new_field_name: Step,
) -> Expression

Creates a cogroup of left_name and right_name at new_field_name.

Source code in struct2tensor/expression.py

def cogroup_by_index(self, source_path: CoercableToPath, left_name: path.Step,
                     right_name: path.Step,
                     new_field_name: path.Step) -> "Expression":
  """Creates a cogroup of left_name and right_name at new_field_name."""
  raise NotImplementedError("cogroup_by_index is not implemented")

create_has_field ¶

create_has_field(
    source_path: CoercableToPath, new_field_name: Step
) -> Expression

Creates a field that is the presence of the source path.

Source code in struct2tensor/expression.py

def create_has_field(self, source_path: CoercableToPath,
                     new_field_name: path.Step) -> "Expression":
  """Creates a field that is the presence of the source path."""
  return size.has(self, path.create_path(source_path), new_field_name)

create_proto_index ¶

create_proto_index(field_name: Step) -> Expression

Creates a proto index field as a direct child of the current root.

The proto index maps each root element to the original batch index. For example: [0, 2] means the first element came from the first proto in the original input tensor and the second element came from the third proto. The created field is always "dense" -- it has the same valency as the current root.

PARAMETER	DESCRIPTION
`field_name`	the name of the field to be created. TYPE: `Step`

RETURNS	DESCRIPTION
`Expression`	An Expression object representing the result of the operation.

Source code in struct2tensor/expression.py

def create_proto_index(self, field_name: path.Step) -> "Expression":
  """Creates a proto index field as a direct child of the current root.

  The proto index maps each root element to the original batch index.
  For example: [0, 2] means the first element came from the first proto
  in the original input tensor and the second element came from the third
  proto. The created field is always "dense" -- it has the same valency as
  the current root.

  Args:
    field_name: the name of the field to be created.

  Returns:
    An Expression object representing the result of the operation.
  """

  return reroot.create_proto_index_field(self, field_name)

create_size_field ¶

create_size_field(
    source_path: CoercableToPath, new_field_name: Step
) -> Expression

Creates a field that is the size of the source path.

Source code in struct2tensor/expression.py

def create_size_field(self, source_path: CoercableToPath,
                      new_field_name: path.Step) -> "Expression":
  """Creates a field that is the size of the source path."""
  return size.size(self, path.create_path(source_path), new_field_name)

get_child ¶

get_child(field_name: Step) -> Optional[Expression]

Gets a named child.

Source code in struct2tensor/expression.py

def get_child(self, field_name: path.Step) -> Optional["Expression"]:
  """Gets a named child."""
  if field_name in self._child_cache:
    return self._child_cache[field_name]
  result = self._get_child_impl(field_name)
  self._child_cache[field_name] = result
  return result

get_child_or_error ¶

get_child_or_error(field_name: Step) -> Expression

Gets a named child.

Source code in struct2tensor/expression.py

def get_child_or_error(self, field_name: path.Step) -> "Expression":
  """Gets a named child."""
  result = self.get_child(field_name)
  if result is None:
    raise KeyError("No such field: {}".format(field_name))
  return result

get_descendant ¶

get_descendant(p: Path) -> Optional[Expression]

Finds the descendant at the path.

Source code in struct2tensor/expression.py

def get_descendant(self, p: path.Path) -> Optional["Expression"]:
  """Finds the descendant at the path."""
  result = self
  for field_name in p.field_list:
    result = result.get_child(field_name)
    if result is None:
      return None
  return result

get_descendant_or_error ¶

get_descendant_or_error(p: Path) -> Expression

Finds the descendant at the path.

Source code in struct2tensor/expression.py

def get_descendant_or_error(self, p: path.Path) -> "Expression":
  """Finds the descendant at the path."""
  result = self.get_descendant(p)
  if result is None:
    raise ValueError("Missing path: {} in {}".format(
        str(p), self.schema_string(limit=20)))
  return result

get_known_children ¶

get_known_children() -> Mapping[Step, Expression]

Source code in struct2tensor/expression.py

def get_known_children(self) -> Mapping[path.Step, "Expression"]:
  known_field_names = self.known_field_names()
  result = {}
  for name in known_field_names:
    result[name] = self.get_child_or_error(name)
  return result

get_known_descendants ¶

get_known_descendants() -> Mapping[Path, Expression]

Gets a mapping from known paths to subexpressions.

The difference between this and get_descendants in Prensor is that all paths in a Prensor are realized, thus all known. But an Expression's descendants might not all be known at the point this method is called, because an expression may have an infinite number of children.

RETURNS	DESCRIPTION
`Mapping[Path, Expression]`	A mapping from paths (relative to the root of the subexpression) to expressions.

Source code in struct2tensor/expression.py

def get_known_descendants(self) -> Mapping[path.Path, "Expression"]:
  # Rename get_known_descendants
  """Gets a mapping from known paths to subexpressions.

  The difference between this and get_descendants in Prensor is that
  all paths in a Prensor are realized, thus all known. But an Expression's
  descendants might not all be known at the point this method is called,
  because an expression may have an infinite number of children.

  Returns:
    A mapping from paths (relative to the root of the subexpression) to
      expressions.
  """
  known_subexpressions = {
      k: v.get_known_descendants()
      for k, v in self.get_known_children().items()
  }
  result = {}
  for field_name, subexpression in known_subexpressions.items():
    subexpression_path = path.Path(
        [field_name], validate_step_format=self.validate_step_format
    )
    for p, expr in subexpression.items():
      result[subexpression_path.concat(p)] = expr
  result[path.Path([], validate_step_format=self.validate_step_format)] = self
  return result

get_paths_with_schema ¶

get_paths_with_schema() -> List[Path]

Extract only paths that contain schema information.

Source code in struct2tensor/expression.py

def get_paths_with_schema(self) -> List[path.Path]:
  """Extract only paths that contain schema information."""
  result = []
  for name, child in self.get_known_children().items():
    if child.schema_feature is None:
      continue
    result.extend(
        [
            path.Path(
                [name], validate_step_format=self.validate_step_format
            ).concat(x)
            for x in child.get_paths_with_schema()
        ]
    )
  # Note: We always take the root path and so will return an empty schema
  # if there is no schema information on any nodes, including the root.
  if not result:
    result.append(
        path.Path([], validate_step_format=self.validate_step_format)
    )
  return result

get_schema ¶

get_schema(create_schema_features=True) -> Schema

Returns a schema for the entire tree.

PARAMETER	DESCRIPTION
`create_schema_features`	If True, schema features are added for all children and a schema entry is created if not available on the child. If False, features are left off of the returned schema if there is no schema_feature on the child. DEFAULT: `True`

Source code in struct2tensor/expression.py

def get_schema(self, create_schema_features=True) -> schema_pb2.Schema:
  """Returns a schema for the entire tree.

  Args:
    create_schema_features: If True, schema features are added for all
      children and a schema entry is created if not available on the child. If
      False, features are left off of the returned schema if there is no
      schema_feature on the child.
  """
  if not create_schema_features:
    return self.project(self.get_paths_with_schema()).get_schema()
  result = schema_pb2.Schema()
  self._populate_schema_feature_children(result.feature)
  return result

get_source_expressions ¶

get_source_expressions() -> Sequence[Expression]

Gets the sources of this expression.

The node tensors of the source expressions must be sufficient to calculate the node tensor of this expression (see calculate and calculate_value_slowly).

RETURNS	DESCRIPTION
`Sequence[Expression]`	The sources of this expression.

Source code in struct2tensor/expression_impl/promote.py

def get_source_expressions(self) -> Sequence[expression.Expression]:
  return [self._origin, self._origin_parent]

known_field_names ¶

known_field_names() -> FrozenSet[Step]

Returns known field names of the expression.

TODO(martinz): implement set_field and project. Known field names of a parsed proto correspond to the fields declared in the message. Examples of "unknown" fields are extensions and explicit casts in an any field. The only way to know if an unknown field "(foo.bar)" is present in an expression expr is to call (expr["(foo.bar)"] is not None).

Notice that simply accessing a field does not make it "known". However, setting a field (or setting a descendant of a field) will make it known.

project(...) returns an expression where the known field names are the only field names. In general, if you want to depend upon known_field_names (e.g., if you want to compile a expression), then the best approach is to project() the expression first.

RETURNS	DESCRIPTION
`FrozenSet[Step]`	An immutable set of field names.

Source code in struct2tensor/expression.py

def known_field_names(self) -> FrozenSet[path.Step]:
  return frozenset()

map_field_values ¶

map_field_values(
    source_path: CoercableToPath,
    operator: Callable[[Tensor], Tensor],
    dtype: DType,
    new_field_name: Step,
) -> Expression

Map a primitive field to create a new primitive field.

Note

The dtype argument is added since the v1 API.

PARAMETER	DESCRIPTION
`source_path`	the origin path. TYPE: `CoercableToPath`
`operator`	an element-wise operator that takes a 1-dimensional vector. TYPE: `Callable[[Tensor], Tensor]`
`dtype`	the type of the output. TYPE: `DType`
`new_field_name`	the name of a new sibling of source_path. TYPE: `Step`

RETURNS	DESCRIPTION
`Expression`	the resulting root expression.

Source code in struct2tensor/expression.py

def map_field_values(self, source_path: CoercableToPath,
                     operator: Callable[[tf.Tensor], tf.Tensor],
                     dtype: tf.DType,
                     new_field_name: path.Step) -> "Expression":
  """Map a primitive field to create a new primitive field.

  !!! Note
      The dtype argument is added since the v1 API.

  Args:
    source_path: the origin path.
    operator: an element-wise operator that takes a 1-dimensional vector.
    dtype: the type of the output.
    new_field_name: the name of a new sibling of source_path.

  Returns:
    the resulting root expression.
  """
  return map_values.map_values(self, path.create_path(source_path), operator,
                               dtype, new_field_name)

map_ragged_tensors ¶

map_ragged_tensors(
    parent_path: CoercableToPath,
    source_fields: Sequence[Step],
    operator: Callable[..., SparseTensor],
    is_repeated: bool,
    dtype: DType,
    new_field_name: Step,
) -> Expression

Maps a set of primitive fields of a message to a new field.

Unlike map_field_values, this operation allows you to some degree reshape the field. For instance, you can take two optional fields and create a repeated field, or perform a reduce_sum on the last dimension of a repeated field and create an optional field. The key constraint is that the operator must return a sparse tensor of the correct dimension: i.e., a 2D sparse tensor if is_repeated is true, or a 1D sparse tensor if is_repeated is false. Moreover, the first dimension of the sparse tensor must be equal to the first dimension of the input tensor.

PARAMETER	DESCRIPTION
`parent_path`	the parent of the input and output fields. TYPE: `CoercableToPath`
`source_fields`	the nonempty list of names of the source fields. TYPE: `Sequence[Step]`
`operator`	an operator that takes len(source_fields) sparse tensors and returns a sparse tensor of the appropriate shape. TYPE: `Callable[..., SparseTensor]`
`is_repeated`	whether the output is repeated. TYPE: `bool`
`dtype`	the dtype of the result. TYPE: `DType`
`new_field_name`	the name of the resulting field. TYPE: `Step`

RETURNS	DESCRIPTION
`Expression`	A new query.

Source code in struct2tensor/expression.py

def map_ragged_tensors(self, parent_path: CoercableToPath,
                       source_fields: Sequence[path.Step],
                       operator: Callable[..., tf.SparseTensor],
                       is_repeated: bool, dtype: tf.DType,
                       new_field_name: path.Step) -> "Expression":
  """Maps a set of primitive fields of a message to a new field.

  Unlike map_field_values, this operation allows you to some degree reshape
  the field. For instance, you can take two optional fields and create a
  repeated field, or perform a reduce_sum on the last dimension of a repeated
  field and create an optional field. The key constraint is that the operator
  must return a sparse tensor of the correct dimension: i.e., a
  2D sparse tensor if is_repeated is true, or a 1D sparse tensor if
  is_repeated is false. Moreover, the first dimension of the sparse tensor
  must be equal to the first dimension of the input tensor.

  Args:
    parent_path: the parent of the input and output fields.
    source_fields: the nonempty list of names of the source fields.
    operator: an operator that takes len(source_fields) sparse tensors and
      returns a sparse tensor of the appropriate shape.
    is_repeated: whether the output is repeated.
    dtype: the dtype of the result.
    new_field_name: the name of the resulting field.

  Returns:
    A new query.
  """
  return map_prensor.map_ragged_tensor(
      self,
      path.create_path(parent_path),
      [
          path.Path([f], validate_step_format=self.validate_step_format)
          for f in source_fields
      ],
      operator,
      is_repeated,
      dtype,
      new_field_name,
  )

map_sparse_tensors ¶

map_sparse_tensors(
    parent_path: CoercableToPath,
    source_fields: Sequence[Step],
    operator: Callable[..., SparseTensor],
    is_repeated: bool,
    dtype: DType,
    new_field_name: Step,
) -> Expression

Maps a set of primitive fields of a message to a new field.

Unlike map_field_values, this operation allows you to some degree reshape the field. For instance, you can take two optional fields and create a repeated field, or perform a reduce_sum on the last dimension of a repeated field and create an optional field. The key constraint is that the operator must return a sparse tensor of the correct dimension: i.e., a 2D sparse tensor if is_repeated is true, or a 1D sparse tensor if is_repeated is false. Moreover, the first dimension of the sparse tensor must be equal to the first dimension of the input tensor.

PARAMETER	DESCRIPTION
`parent_path`	the parent of the input and output fields. TYPE: `CoercableToPath`
`source_fields`	the nonempty list of names of the source fields. TYPE: `Sequence[Step]`
`operator`	an operator that takes len(source_fields) sparse tensors and returns a sparse tensor of the appropriate shape. TYPE: `Callable[..., SparseTensor]`
`is_repeated`	whether the output is repeated. TYPE: `bool`
`dtype`	the dtype of the result. TYPE: `DType`
`new_field_name`	the name of the resulting field. TYPE: `Step`

RETURNS	DESCRIPTION
`Expression`	A new query.

Source code in struct2tensor/expression.py

def map_sparse_tensors(self, parent_path: CoercableToPath,
                       source_fields: Sequence[path.Step],
                       operator: Callable[..., tf.SparseTensor],
                       is_repeated: bool, dtype: tf.DType,
                       new_field_name: path.Step) -> "Expression":
  """Maps a set of primitive fields of a message to a new field.

  Unlike map_field_values, this operation allows you to some degree reshape
  the field. For instance, you can take two optional fields and create a
  repeated field, or perform a reduce_sum on the last dimension of a repeated
  field and create an optional field. The key constraint is that the operator
  must return a sparse tensor of the correct dimension: i.e., a
  2D sparse tensor if is_repeated is true, or a 1D sparse tensor if
  is_repeated is false. Moreover, the first dimension of the sparse tensor
  must be equal to the first dimension of the input tensor.

  Args:
    parent_path: the parent of the input and output fields.
    source_fields: the nonempty list of names of the source fields.
    operator: an operator that takes len(source_fields) sparse tensors and
      returns a sparse tensor of the appropriate shape.
    is_repeated: whether the output is repeated.
    dtype: the dtype of the result.
    new_field_name: the name of the resulting field.

  Returns:
    A new query.
  """
  return map_prensor.map_sparse_tensor(
      self,
      path.create_path(parent_path),
      [
          path.Path([f], validate_step_format=self.validate_step_format)
          for f in source_fields
      ],
      operator,
      is_repeated,
      dtype,
      new_field_name,
  )

project ¶

project(path_list: Sequence[CoercableToPath]) -> Expression

Constrains the paths to those listed.

Source code in struct2tensor/expression.py

def project(self, path_list: Sequence[CoercableToPath]) -> "Expression":
  """Constrains the paths to those listed."""
  return project.project(self, [path.create_path(x) for x in path_list])

promote ¶

promote(source_path: CoercableToPath, new_field_name: Step)

Promotes source_path to be a field new_field_name in its grandparent.

Source code in struct2tensor/expression.py

def promote(self, source_path: CoercableToPath, new_field_name: path.Step):
  """Promotes source_path to be a field new_field_name in its grandparent."""
  return promote.promote(self, path.create_path(source_path), new_field_name)

promote_and_broadcast ¶

promote_and_broadcast(
    path_dictionary: Mapping[Step, CoercableToPath],
    dest_path_parent: CoercableToPath,
) -> Expression

Source code in struct2tensor/expression.py

def promote_and_broadcast(
    self, path_dictionary: Mapping[path.Step, CoercableToPath],
    dest_path_parent: CoercableToPath) -> "Expression":
  return promote_and_broadcast.promote_and_broadcast(
      self, {k: path.create_path(v) for k, v in path_dictionary.items()},
      path.create_path(dest_path_parent))

reroot ¶

reroot(new_root: CoercableToPath) -> Expression

Returns a new list of protocol buffers available at new_root.

Source code in struct2tensor/expression.py

def reroot(self, new_root: CoercableToPath) -> "Expression":
  """Returns a new list of protocol buffers available at new_root."""
  return reroot.reroot(self, path.create_path(new_root))

schema_string ¶

schema_string(limit: Optional[int] = None) -> str

Returns a schema for the expression.

For examle,

repeated root:
  optional int32 foo
  optional bar:
    optional string baz
  optional int64 bak

Note that unknown fields and subexpressions are not displayed.

PARAMETER	DESCRIPTION
`limit`	if present, limit the recursion. TYPE: `Optional[int]` DEFAULT: `None`

RETURNS	DESCRIPTION
`str`	A string, describing (a part of) the schema.

Source code in struct2tensor/expression.py

def schema_string(self, limit: Optional[int] = None) -> str:
  """Returns a schema for the expression.

  For examle,
  ```
  repeated root:
    optional int32 foo
    optional bar:
      optional string baz
    optional int64 bak
  ```

  Note that unknown fields and subexpressions are not displayed.

  Args:
    limit: if present, limit the recursion.

  Returns:
    A string, describing (a part of) the schema.
  """
  return "\n".join(self._schema_string_helper("root", limit))

slice ¶

slice(
    source_path: CoercableToPath,
    new_field_name: Step,
    begin: Optional[IndexValue] = None,
    end: Optional[IndexValue] = None,
) -> Expression

Creates a slice copy of source_path at new_field_path.

Note that if begin or end is negative, it is considered relative to the size of the array. e.g., slice(...,begin=-1) will get the last element of every array.

PARAMETER	DESCRIPTION
`source_path`	the source of the slice. TYPE: `CoercableToPath`
`new_field_name`	the new field that is generated. TYPE: `Step`
`begin`	the beginning of the slice (inclusive). TYPE: `Optional[IndexValue]` DEFAULT: `None`
`end`	the end of the slice (exclusive). TYPE: `Optional[IndexValue]` DEFAULT: `None`

RETURNS	DESCRIPTION
`Expression`	An Expression object representing the result of the operation.

Source code in struct2tensor/expression.py

def slice(self,
          source_path: CoercableToPath,
          new_field_name: path.Step,
          begin: Optional[IndexValue] = None,
          end: Optional[IndexValue] = None) -> "Expression":
  """Creates a slice copy of source_path at new_field_path.

  Note that if begin or end is negative, it is considered relative to
  the size of the array. e.g., slice(...,begin=-1) will get the last
  element of every array.

  Args:
    source_path: the source of the slice.
    new_field_name: the new field that is generated.
    begin: the beginning of the slice (inclusive).
    end: the end of the slice (exclusive).

  Returns:
    An Expression object representing the result of the operation.
  """
  return slice_expression.slice_expression(self,
                                           path.create_path(source_path),
                                           new_field_name, begin, end)

truncate ¶

truncate(
    source_path: CoercableToPath,
    limit: Union[int, Tensor],
    new_field_name: Step,
) -> Expression

Creates a truncated copy of source_path at new_field_path.

Source code in struct2tensor/expression.py

def truncate(self, source_path: CoercableToPath, limit: Union[int, tf.Tensor],
             new_field_name: path.Step) -> "Expression":
  """Creates a truncated copy of source_path at new_field_path."""
  return self.slice(source_path, new_field_name, end=limit)

Functions¶

promote ¶

promote(
    root: Expression, p: Path, new_field_name: Step
) -> Expression

Promote a path to be a child of its grandparent, and give it a name.

Source code in struct2tensor/expression_impl/promote.py

def promote(root: expression.Expression, p: path.Path,
            new_field_name: path.Step) -> expression.Expression:
  """Promote a path to be a child of its grandparent, and give it a name."""
  return _promote_impl(root, p, new_field_name)[0]

promote_anonymous ¶

promote_anonymous(
    root: Expression, p: Path
) -> Tuple[Expression, Path]

Promote a path to be a new anonymous child of its grandparent.

Source code in struct2tensor/expression_impl/promote.py

def promote_anonymous(root: expression.Expression,
                      p: path.Path) -> Tuple[expression.Expression, path.Path]:
  """Promote a path to be a new anonymous child of its grandparent."""
  return _promote_impl(root, p, path.get_anonymous_field())

Modules¶

promote_and_broadcast ¶

promote_and_broadcast a set of nodes.

For example, suppose an expr represents:

+
|
+-session*   (stars indicate repeated)
     |
     +-event*
     |   |
     |   +-val*-int64
     |
     +-user_info? (question mark indicates optional)
           |
           +-age? int64

session: {
  event: {
    val: 1
  }
  event: {
    val: 4
    val: 5
  }
  user_info: {
    age: 25
  }
}

session: {
  event: {
    val: 7
  }
  event: {
    val: 8
    val: 9
  }
  user_info: {
    age: 20
  }
}

promote_and_broadcast.promote_and_broadcast(
    path.Path(["event"]),{"nage":path.Path(["user_info","age"])})

creates:

+
|
+-session*   (stars indicate repeated)
     |
     +-event*
     |   |
     |   +-val*-int64
     |   |
     |   +-nage*-int64
     |
     +-user_info? (question mark indicates optional)
           |
           +-age? int64

session: {
  event: {
    nage: 25
    val: 1
  }
  event: {
    nage: 25
    val: 4
    val: 5
  }
  user_info: {
    age: 25
  }
}

session: {
  event: {
    nage: 20
    val: 7
  }
  event: {
    nage: 20
    val: 8
    val: 9
  }
  user_info: {
    age: 20
  }
}

FUNCTION	DESCRIPTION
`promote_and_broadcast`	Promote and broadcast a set of paths to a particular location.
`promote_and_broadcast_anonymous`	Promotes then broadcasts the origin until its parent is new_parent.

Functions¶

promote_and_broadcast ¶

promote_and_broadcast(
    root: Expression,
    path_dictionary: Mapping[Step, Path],
    dest_path_parent: Path,
) -> Expression

Promote and broadcast a set of paths to a particular location.

PARAMETER	DESCRIPTION
`root`	the original expression. TYPE: `Expression`
`path_dictionary`	a map from destination fields to origin paths. TYPE: `Mapping[Step, Path]`
`dest_path_parent`	a map from destination strings. TYPE: `Path`

RETURNS	DESCRIPTION
`Expression`	A new expression, where all the origin paths are promoted and broadcast until they are children of dest_path_parent.

Source code in struct2tensor/expression_impl/promote_and_broadcast.py

def promote_and_broadcast(root: expression.Expression,
                          path_dictionary: Mapping[path.Step, path.Path],
                          dest_path_parent: path.Path) -> expression.Expression:
  """Promote and broadcast a set of paths to a particular location.

  Args:
    root: the original expression.
    path_dictionary: a map from destination fields to origin paths.
    dest_path_parent: a map from destination strings.

  Returns:
    A new expression, where all the origin paths are promoted and broadcast
      until they are children of dest_path_parent.
  """

  result_paths = {}
  # Here, we branch out and create a different tree for each field that is
  # promoted and broadcast.
  for field_name, origin_path in path_dictionary.items():
    result_path = dest_path_parent.get_child(field_name)
    new_root = _promote_and_broadcast_name(root, origin_path, dest_path_parent,
                                           field_name)
    result_paths[result_path] = new_root
  # We create a new tree that has all of the generated fields from the older
  # trees.
  return expression_add.add_to(root, result_paths)

promote_and_broadcast_anonymous ¶

promote_and_broadcast_anonymous(
    root: Expression, origin: Path, new_parent: Path
) -> Tuple[Expression, Path]

Promotes then broadcasts the origin until its parent is new_parent.

Source code in struct2tensor/expression_impl/promote_and_broadcast.py

def promote_and_broadcast_anonymous(
    root: expression.Expression, origin: path.Path,
    new_parent: path.Path) -> Tuple[expression.Expression, path.Path]:
  """Promotes then broadcasts the origin until its parent is new_parent."""
  least_common_ancestor = origin.get_least_common_ancestor(new_parent)

  new_expr, new_path = root, origin
  while new_path.get_parent() != least_common_ancestor:
    new_expr, new_path = promote.promote_anonymous(new_expr, new_path)

  while new_path.get_parent() != new_parent:
    new_parent_step = new_parent.field_list[len(new_path) - 1]
    new_expr, new_path = broadcast.broadcast_anonymous(new_expr, new_path,
                                                       new_parent_step)

  return new_expr, new_path

Modules¶

proto ¶

Expressions to parse a proto.

These expressions return values with more information than standard node values. Specifically, each node calculates additional tensors that are used as inputs for its children.

FUNCTION	DESCRIPTION
`create_expression_from_file_descriptor_set`	Create an expression from a 1D tensor of serialized protos.
`create_expression_from_proto`	Create an expression from a 1D tensor of serialized protos.
`create_transformed_field`	Create an expression that transforms serialized proto tensors.
`is_proto_expression`	Returns true if an expression is a ProtoExpression.

ATTRIBUTE	DESCRIPTION
`ProtoExpression`
`ProtoFieldName`
`ProtoFullName`
`StrStep`
`TransformFn`

Attributes¶

ProtoExpression `module-attribute` ¶

ProtoExpression = Union[
    _ProtoRootExpression,
    _ProtoChildExpression,
    _ProtoLeafExpression,
]

ProtoFieldName `module-attribute` ¶

ProtoFieldName = str

ProtoFullName `module-attribute` ¶

ProtoFullName = str

StrStep `module-attribute` ¶

StrStep = str

TransformFn `module-attribute` ¶

TransformFn = Callable[
    [Tensor, Tensor], Tuple[Tensor, Tensor]
]

Functions¶

create_expression_from_file_descriptor_set ¶

create_expression_from_file_descriptor_set(
    tensor_of_protos: Tensor,
    proto_name: ProtoFullName,
    file_descriptor_set: FileDescriptorSet,
    message_format: str = "binary",
) -> Expression

Create an expression from a 1D tensor of serialized protos.

PARAMETER	DESCRIPTION
`tensor_of_protos`	1D tensor of serialized protos. TYPE: `Tensor`
`proto_name`	fully qualified name (e.g. "some.package.SomeProto") of the proto in `tensor_of_protos`. TYPE: `ProtoFullName`
`file_descriptor_set`	The FileDescriptorSet proto containing `proto_name`'s and all its dependencies' FileDescriptorProto. Note that if file1 imports file2, then file2's FileDescriptorProto must precede file1's in file_descriptor_set.file. TYPE: `FileDescriptorSet`
`message_format`	Indicates the format of the protocol buffer: is one of 'text' or 'binary'. TYPE: `str` DEFAULT: `'binary'`

RETURNS	DESCRIPTION
`Expression`	An expression.

Source code in struct2tensor/expression_impl/proto.py

def create_expression_from_file_descriptor_set(
    tensor_of_protos: tf.Tensor,
    proto_name: ProtoFullName,
    file_descriptor_set: descriptor_pb2.FileDescriptorSet,
    message_format: str = "binary") -> expression.Expression:
  """Create an expression from a 1D tensor of serialized protos.

  Args:
    tensor_of_protos: 1D tensor of serialized protos.
    proto_name: fully qualified name (e.g. "some.package.SomeProto") of the
      proto in `tensor_of_protos`.
    file_descriptor_set: The FileDescriptorSet proto containing `proto_name`'s
      and all its dependencies' FileDescriptorProto. Note that if file1 imports
      file2, then file2's FileDescriptorProto must precede file1's in
      file_descriptor_set.file.
    message_format: Indicates the format of the protocol buffer: is one of
       'text' or 'binary'.

  Returns:
    An expression.
  """

  pool = DescriptorPool()
  for f in file_descriptor_set.file:
    # This method raises if f's dependencies have not been added.
    pool.Add(f)

  # This method raises if proto not found.
  desc = pool.FindMessageTypeByName(proto_name)

  return create_expression_from_proto(tensor_of_protos, desc, message_format)

create_expression_from_proto ¶

create_expression_from_proto(
    tensor_of_protos: Tensor,
    desc: Descriptor,
    message_format: str = "binary",
) -> Expression

Create an expression from a 1D tensor of serialized protos.

PARAMETER	DESCRIPTION
`tensor_of_protos`	1D tensor of serialized protos. TYPE: `Tensor`
`desc`	a descriptor of protos in tensor of protos. TYPE: `Descriptor`
`message_format`	Indicates the format of the protocol buffer: is one of 'text' or 'binary'. TYPE: `str` DEFAULT: `'binary'`

RETURNS	DESCRIPTION
`Expression`	An expression.

Source code in struct2tensor/expression_impl/proto.py

def create_expression_from_proto(
    tensor_of_protos: tf.Tensor,
    desc: descriptor.Descriptor,
    message_format: str = "binary") -> expression.Expression:
  """Create an expression from a 1D tensor of serialized protos.

  Args:
    tensor_of_protos: 1D tensor of serialized protos.
    desc: a descriptor of protos in tensor of protos.
    message_format: Indicates the format of the protocol buffer: is one of
      'text' or 'binary'.

  Returns:
    An expression.
  """
  return _ProtoRootExpression(desc, tensor_of_protos, message_format)

create_transformed_field ¶

create_transformed_field(
    expr: Expression,
    source_path: CoercableToPath,
    dest_field: StrStep,
    transform_fn: TransformFn,
) -> Expression

Create an expression that transforms serialized proto tensors.

The transform_fn argument should take the form:

def transform_fn(parent_indices, values): ... return (transformed_parent_indices, transformed_values)

Given:

parent_indices: an int64 vector of non-decreasing parent message indices.
values: a string vector of serialized protos having the same shape as parent_indices.

transform_fn must return new parent indices and serialized values encoding the same proto message as the passed in values. These two vectors must have the same size, but it need not be the same as the input arguments.

Note

If CalculateOptions.use_string_view (set at calculate time, thus this Expression cannot know beforehand) is True, values passed to transform_fn are string views pointing all the way back to the original input tensor (of serialized root protos). And transform_fn must maintain such views and avoid creating new values that are either not string views into the root protos or self-owned strings. This is because downstream decoding ops will still produce string views referring into its input (which are string views into the root proto) and they will only hold a reference to the original, root proto tensor, keeping it alive. So the input tensor may get destroyed after the decoding op.

In short, you can do element-wise transforms to values, but can't mutate the contents of elements in values or create new elements.

To lift this restriction, a decoding op must be told to hold a reference of the input tensors of all its upstream decoding ops.

PARAMETER	DESCRIPTION
`expr`	a source expression containing `source_path`. TYPE: `Expression`
`source_path`	the path to the field to reverse. TYPE: `CoercableToPath`
`dest_field`	the name of the newly created field. This field will be a sibling of the field identified by `source_path`. TYPE: `StrStep`
`transform_fn`	a callable that accepts parent_indices and serialized proto values and returns a posibly modified parent_indices and values. Note that when CalcuateOptions.use_string_view is set, transform_fn should not have any stateful side effecting uses of serialized proto inputs. Doing so could cause segfaults as the backing string tensor lifetime is not guaranteed when the side effecting operations are run. TYPE: `TransformFn`

RETURNS	DESCRIPTION
`Expression`	An expression.

RAISES	DESCRIPTION
`ValueError`	if the source path is not a proto message field.

Source code in struct2tensor/expression_impl/proto.py

def create_transformed_field(
    expr: expression.Expression, source_path: path.CoercableToPath,
    dest_field: StrStep, transform_fn: TransformFn) -> expression.Expression:
  """Create an expression that transforms serialized proto tensors.

  The transform_fn argument should take the form:

  def transform_fn(parent_indices, values):
    ...
    return (transformed_parent_indices, transformed_values)

  Given:

  - parent_indices: an int64 vector of non-decreasing parent message indices.
  - values: a string vector of serialized protos having the same shape as
    `parent_indices`.

  `transform_fn` must return new parent indices and serialized values encoding
  the same proto message as the passed in `values`.  These two vectors must
  have the same size, but it need not be the same as the input arguments.

  !!! Note
      If CalculateOptions.use_string_view (set at calculate time, thus this
      Expression cannot know beforehand) is True, `values` passed to
      `transform_fn` are string views pointing all the way back to the original
      input tensor (of serialized root protos). And `transform_fn` must maintain
      such views and avoid creating new values that are either not string views
      into the root protos or self-owned strings. This is because downstream
      decoding ops will still produce string views referring into its input
      (which are string views into the root proto) and they will only hold a
      reference to the original, root proto tensor, keeping it alive. So the input
      tensor may get destroyed after the decoding op.

      In short, you can do element-wise transforms to `values`, but can't mutate
      the contents of elements in `values` or create new elements.

      To lift this restriction, a decoding op must be told to hold a reference
      of the input tensors of all its upstream decoding ops.


  Args:
    expr: a source expression containing `source_path`.
    source_path: the path to the field to reverse.
    dest_field: the name of the newly created field. This field will be a
      sibling of the field identified by `source_path`.
    transform_fn: a callable that accepts parent_indices and serialized proto
      values and returns a posibly modified parent_indices and values. Note that
      when CalcuateOptions.use_string_view is set, transform_fn should not have
      any stateful side effecting uses of serialized proto inputs. Doing so
      could cause segfaults as the backing string tensor lifetime is not
      guaranteed when the side effecting operations are run.

  Returns:
    An expression.

  Raises:
    ValueError: if the source path is not a proto message field.
  """
  source_path = path.create_path(source_path)
  source_expr = expr.get_descendant_or_error(source_path)
  if not isinstance(source_expr, _ProtoChildExpression):
    raise ValueError(
        "Expected _ProtoChildExpression for field {}, but found {}.".format(
            str(source_path), source_expr))

  if isinstance(source_expr, _TransformProtoChildExpression):
    # In order to be able to propagate fields needed for parsing, the source
    # expression of _TransformProtoChildExpression must always be the original
    # _ProtoChildExpression before any transformation. This means that two
    # sequentially applied _TransformProtoChildExpression would have the same
    # source and would apply the transformation to the source directly, instead
    # of one transform operating on the output of the other.
    # To work around this, the user supplied transform function is wrapped to
    # first call the source's transform function.
    # The downside of this approach is that the initial transform may be
    # applied redundantly if there are other expressions derived directly
    # from it.
    def final_transform(parent_indices: tf.Tensor,
                        values: tf.Tensor) -> Tuple[tf.Tensor, tf.Tensor]:
      parent_indices, values = source_expr.transform_fn(parent_indices, values)
      return transform_fn(parent_indices, values)
  else:
    final_transform = transform_fn

  transformed_expr = _TransformProtoChildExpression(
      parent=source_expr._parent,  # pylint: disable=protected-access
      desc=source_expr._desc,  # pylint: disable=protected-access
      is_repeated=source_expr.is_repeated,
      name_as_field=source_expr.name_as_field,
      transform_fn=final_transform,
      backing_str_tensor=source_expr._backing_str_tensor)  # pylint: disable=protected-access
  dest_path = source_path.get_parent().get_child(dest_field)
  return expression_add.add_paths(expr, {dest_path: transformed_expr})

is_proto_expression ¶

is_proto_expression(expr: Expression) -> bool

Returns true if an expression is a ProtoExpression.

Source code in struct2tensor/expression_impl/proto.py

def is_proto_expression(expr: expression.Expression) -> bool:
  """Returns true if an expression is a ProtoExpression."""
  return isinstance(
      expr, (_ProtoRootExpression, _ProtoChildExpression, _ProtoLeafExpression))

Modules¶

reroot ¶

Reroot to a subtree, maintaining an input proto index.

reroot is similar to get_descendant_or_error. However, this method allows you to call create_proto_index(...) later on, that gives you a reference to the original proto.

FUNCTION	DESCRIPTION
`create_proto_index_field`
`reroot`	Reroot to a new path, maintaining a input proto index.

Functions¶

create_proto_index_field ¶

create_proto_index_field(
    root: Expression, new_field_name: Step
) -> Expression

Source code in struct2tensor/expression_impl/reroot.py

def create_proto_index_field(root: expression.Expression,
                             new_field_name: path.Step
                            ) -> expression.Expression:
  return expression_add.add_paths(
      root, {path.Path([new_field_name]): _InputProtoIndexExpression(root)})

reroot ¶

reroot(root: Expression, source_path: Path) -> Expression

Reroot to a new path, maintaining a input proto index.

Similar to root.get_descendant_or_error(source_path): however, this method retains the ability to get a map to the original index.

PARAMETER	DESCRIPTION
`root`	the original root. TYPE: `Expression`
`source_path`	the path to the new root. TYPE: `Path`

RETURNS	DESCRIPTION
`Expression`	the new root.

Source code in struct2tensor/expression_impl/reroot.py

def reroot(root: expression.Expression,
           source_path: path.Path) -> expression.Expression:
  """Reroot to a new path, maintaining a input proto index.

  Similar to root.get_descendant_or_error(source_path): however, this
  method retains the ability to get a map to the original index.

  Args:
    root: the original root.
    source_path: the path to the new root.

  Returns:
    the new root.
  """

  new_root = root
  for step in source_path.field_list:
    new_root = _RerootExpression(new_root, step)
  return new_root

Modules¶

size ¶

Functions for creating new size or has expression.

Given a field "foo.bar",

root = size(expr, path.Path(["foo","bar"]), "bar_size")

creates a new expression root that has an optional field "foo.bar_size", which is always present, and contains the number of bar in a particular foo.

root_2 = has(expr, path.Path(["foo","bar"]), "bar_has")

creates a new expression root that has an optional field "foo.bar_has", which is always present, and is true if there are one or more bar in foo.

CLASS	DESCRIPTION
`SizeExpression`	Size of the given expression.

FUNCTION	DESCRIPTION
`has`	Get the has of a field as a new sibling field.
`size`	Get the size of a field as a new sibling field.
`size_anonymous`	Calculate the size of a field, and store it as an anonymous sibling.

Classes¶

SizeExpression ¶

SizeExpression(
    origin: Expression, origin_parent: Expression
)

Bases: Leaf

Size of the given expression.

SizeExpression is intended to be a sibling of origin. origin_parent should be the parent of origin.

Initialize a Leaf.

Note that a leaf must have a specified type.

PARAMETER	DESCRIPTION
`is_repeated`	if the expression is repeated. TYPE: `bool`
`my_type`	the DType of the field. TYPE: `DType`
`schema_feature`	schema information about the field. TYPE: `Optional[Feature]` DEFAULT: `None`

METHOD	DESCRIPTION
`apply`
`apply_schema`
`broadcast`	Broadcasts the existing field at source_path to the sibling_field.
`calculate`	Calculates the node tensor of the expression.
`calculation_equal`	self.calculate is equal to another expression.calculate.
`calculation_is_identity`	True iff the self.calculate is the identity.
`cogroup_by_index`	Creates a cogroup of left_name and right_name at new_field_name.
`create_has_field`	Creates a field that is the presence of the source path.
`create_proto_index`	Creates a proto index field as a direct child of the current root.
`create_size_field`	Creates a field that is the size of the source path.
`get_child`	Gets a named child.
`get_child_or_error`	Gets a named child.
`get_descendant`	Finds the descendant at the path.
`get_descendant_or_error`	Finds the descendant at the path.
`get_known_children`
`get_known_descendants`	Gets a mapping from known paths to subexpressions.
`get_paths_with_schema`	Extract only paths that contain schema information.
`get_schema`	Returns a schema for the entire tree.
`get_source_expressions`	Gets the sources of this expression.
`known_field_names`	Returns known field names of the expression.
`map_field_values`	Map a primitive field to create a new primitive field.
`map_ragged_tensors`	Maps a set of primitive fields of a message to a new field.
`map_sparse_tensors`	Maps a set of primitive fields of a message to a new field.
`project`	Constrains the paths to those listed.
`promote`	Promotes source_path to be a field new_field_name in its grandparent.
`promote_and_broadcast`
`reroot`	Returns a new list of protocol buffers available at new_root.
`schema_string`	Returns a schema for the expression.
`slice`	Creates a slice copy of source_path at new_field_path.
`truncate`	Creates a truncated copy of source_path at new_field_path.

ATTRIBUTE	DESCRIPTION
`is_leaf`	True iff the node tensor is a LeafNodeTensor. TYPE: `bool`
`is_repeated`	True iff the same parent value can have multiple children values. TYPE: `bool`
`schema_feature`	Return the schema of the field. TYPE: `Optional[Feature]`
`type`	dtype of the expression, or None if not a leaf expression. TYPE: `Optional[DType]`
`validate_step_format`	TYPE: `bool`

Source code in struct2tensor/expression_impl/size.py

def __init__(self, origin: expression.Expression,
             origin_parent: expression.Expression):
  super().__init__(False, tf.int64)
  self._origin = origin
  self._origin_parent = origin_parent

Attributes¶

is_leaf property ¶

is_leaf: bool

True iff the node tensor is a LeafNodeTensor.

is_repeated property ¶

is_repeated: bool

True iff the same parent value can have multiple children values.

schema_feature property ¶

schema_feature: Optional[Feature]

Return the schema of the field.

type property ¶

type: Optional[DType]

dtype of the expression, or None if not a leaf expression.

validate_step_format property ¶

validate_step_format: bool

Functions¶

apply ¶

apply(
    transform: Callable[[Expression], Expression],
) -> Expression

Source code in struct2tensor/expression.py

def apply(self,
          transform: Callable[["Expression"], "Expression"]) -> "Expression":
  return transform(self)

apply_schema ¶

apply_schema(schema: Schema) -> Expression

Source code in struct2tensor/expression.py

def apply_schema(self, schema: schema_pb2.Schema) -> "Expression":
  return apply_schema.apply_schema(self, schema)

broadcast ¶

broadcast(
    source_path: CoercableToPath,
    sibling_field: Step,
    new_field_name: Step,
) -> Expression

Broadcasts the existing field at source_path to the sibling_field.

Source code in struct2tensor/expression.py

def broadcast(self, source_path: CoercableToPath, sibling_field: path.Step,
              new_field_name: path.Step) -> "Expression":
  """Broadcasts the existing field at source_path to the sibling_field."""
  return broadcast.broadcast(self, path.create_path(source_path),
                             sibling_field, new_field_name)

calculate ¶

calculate(
    sources: Sequence[NodeTensor],
    destinations: Sequence[Expression],
    options: Options,
    side_info: Optional[Prensor] = None,
) -> NodeTensor

Calculates the node tensor of the expression.

The node tensor must be a function of the properties of the expression and the node tensors of the expressions from get_source_expressions().

If is_leaf, then calculate must return a LeafNodeTensor. Otherwise, it must return a ChildNodeTensor or RootNodeTensor.

If calculate_is_identity is true, then this must return source_tensors[0].

Sometimes, for operations such as parsing the proto, calculate will return additional information. For example, calculate() for the root of the proto expression also parses out the tensors required to calculate the tensors of the children. This is why destinations are required.

For a reference use, see calculate_value_slowly(...) below.

PARAMETER	DESCRIPTION
`source_tensors`	The node tensors of the expressions in get_source_expressions(). TYPE: `Sequence[NodeTensor]`
`destinations`	The expressions that will use the output of this method. TYPE: `Sequence[Expression]`
`options`	Options for the calculation. TYPE: `Options`
`side_info`	An optional prensor that is used to bind to a placeholder expression. TYPE: `Optional[Prensor]` DEFAULT: `None`

RETURNS	DESCRIPTION
`NodeTensor`	A NodeTensor representing the output of this expression.

Source code in struct2tensor/expression_impl/size.py

def calculate(
    self,
    sources: Sequence[prensor.NodeTensor],
    destinations: Sequence[expression.Expression],
    options: calculate_options.Options,
    side_info: Optional[prensor.Prensor] = None) -> prensor.NodeTensor:

  [origin_value, origin_parent_value] = sources
  if not isinstance(origin_value,
                    (prensor.LeafNodeTensor, prensor.ChildNodeTensor)):
    raise ValueError(
        "origin_value must be a LeafNodeTensor or a ChildNodeTensor, "
        "but was a " + str(type(origin_value)))

  if not isinstance(origin_parent_value,
                    (prensor.ChildNodeTensor, prensor.RootNodeTensor)):
    raise ValueError("origin_parent_value must be a ChildNodeTensor "
                     "or a RootNodeTensor, but was a " +
                     str(type(origin_parent_value)))

  parent_index = origin_value.parent_index
  num_parent_protos = origin_parent_value.size
  # A vector of 1s of the same size as the parent_index.
  updates = tf.ones(tf.shape(parent_index), dtype=tf.int64)
  indices = tf.expand_dims(parent_index, 1)
  # This is incrementing the size by 1 for each element.
  # Obviously, not the fastest way to do this.
  values = tf.scatter_nd(indices, updates, tf.reshape(num_parent_protos, [1]))

  # Need to create a new_parent_index = 0,1,2,3,4...n.
  new_parent_index = tf.range(num_parent_protos, dtype=tf.int64)
  return prensor.LeafNodeTensor(new_parent_index, values, False)

calculation_equal ¶

calculation_equal(expr: Expression) -> bool

self.calculate is equal to another expression.calculate.

Given the same source node tensors, self.calculate(...) and expression.calculate(...) will have the same result.

Note that this does not check that the source expressions of the two expressions are the same. Therefore, two operations can have the same calculation, but not the same output, because their sources are different. For example, if a.calculation_is_identity() is True and b.calculation_is_identity() is True, then a.calculation_equal(b) is True. However, unless a and b have the same source, the expressions themselves are not equal.

PARAMETER	DESCRIPTION
`expression`	The expression to compare to. TYPE: `Expression`

Source code in struct2tensor/expression_impl/size.py

def calculation_equal(self, expr: expression.Expression) -> bool:
  return isinstance(expr, SizeExpression)

calculation_is_identity ¶

calculation_is_identity() -> bool

True iff the self.calculate is the identity.

There is exactly one source, and the output of self.calculate(...) is the node tensor of this source.

Source code in struct2tensor/expression_impl/size.py

def calculation_is_identity(self) -> bool:
  return False

cogroup_by_index ¶

cogroup_by_index(
    source_path: CoercableToPath,
    left_name: Step,
    right_name: Step,
    new_field_name: Step,
) -> Expression

Creates a cogroup of left_name and right_name at new_field_name.

Source code in struct2tensor/expression.py

def cogroup_by_index(self, source_path: CoercableToPath, left_name: path.Step,
                     right_name: path.Step,
                     new_field_name: path.Step) -> "Expression":
  """Creates a cogroup of left_name and right_name at new_field_name."""
  raise NotImplementedError("cogroup_by_index is not implemented")

create_has_field ¶

create_has_field(
    source_path: CoercableToPath, new_field_name: Step
) -> Expression

Creates a field that is the presence of the source path.

Source code in struct2tensor/expression.py

def create_has_field(self, source_path: CoercableToPath,
                     new_field_name: path.Step) -> "Expression":
  """Creates a field that is the presence of the source path."""
  return size.has(self, path.create_path(source_path), new_field_name)

create_proto_index ¶

create_proto_index(field_name: Step) -> Expression

Creates a proto index field as a direct child of the current root.

The proto index maps each root element to the original batch index. For example: [0, 2] means the first element came from the first proto in the original input tensor and the second element came from the third proto. The created field is always "dense" -- it has the same valency as the current root.

PARAMETER	DESCRIPTION
`field_name`	the name of the field to be created. TYPE: `Step`

RETURNS	DESCRIPTION
`Expression`	An Expression object representing the result of the operation.

Source code in struct2tensor/expression.py

def create_proto_index(self, field_name: path.Step) -> "Expression":
  """Creates a proto index field as a direct child of the current root.

  The proto index maps each root element to the original batch index.
  For example: [0, 2] means the first element came from the first proto
  in the original input tensor and the second element came from the third
  proto. The created field is always "dense" -- it has the same valency as
  the current root.

  Args:
    field_name: the name of the field to be created.

  Returns:
    An Expression object representing the result of the operation.
  """

  return reroot.create_proto_index_field(self, field_name)

create_size_field ¶

create_size_field(
    source_path: CoercableToPath, new_field_name: Step
) -> Expression

Creates a field that is the size of the source path.

Source code in struct2tensor/expression.py

def create_size_field(self, source_path: CoercableToPath,
                      new_field_name: path.Step) -> "Expression":
  """Creates a field that is the size of the source path."""
  return size.size(self, path.create_path(source_path), new_field_name)

get_child ¶

get_child(field_name: Step) -> Optional[Expression]

Gets a named child.

Source code in struct2tensor/expression.py

def get_child(self, field_name: path.Step) -> Optional["Expression"]:
  """Gets a named child."""
  if field_name in self._child_cache:
    return self._child_cache[field_name]
  result = self._get_child_impl(field_name)
  self._child_cache[field_name] = result
  return result

get_child_or_error ¶

get_child_or_error(field_name: Step) -> Expression

Gets a named child.

Source code in struct2tensor/expression.py

def get_child_or_error(self, field_name: path.Step) -> "Expression":
  """Gets a named child."""
  result = self.get_child(field_name)
  if result is None:
    raise KeyError("No such field: {}".format(field_name))
  return result

get_descendant ¶

get_descendant(p: Path) -> Optional[Expression]

Finds the descendant at the path.

Source code in struct2tensor/expression.py

def get_descendant(self, p: path.Path) -> Optional["Expression"]:
  """Finds the descendant at the path."""
  result = self
  for field_name in p.field_list:
    result = result.get_child(field_name)
    if result is None:
      return None
  return result

get_descendant_or_error ¶

get_descendant_or_error(p: Path) -> Expression

Finds the descendant at the path.

Source code in struct2tensor/expression.py

def get_descendant_or_error(self, p: path.Path) -> "Expression":
  """Finds the descendant at the path."""
  result = self.get_descendant(p)
  if result is None:
    raise ValueError("Missing path: {} in {}".format(
        str(p), self.schema_string(limit=20)))
  return result

get_known_children ¶

get_known_children() -> Mapping[Step, Expression]

Source code in struct2tensor/expression.py

def get_known_children(self) -> Mapping[path.Step, "Expression"]:
  known_field_names = self.known_field_names()
  result = {}
  for name in known_field_names:
    result[name] = self.get_child_or_error(name)
  return result

get_known_descendants ¶

get_known_descendants() -> Mapping[Path, Expression]

Gets a mapping from known paths to subexpressions.

The difference between this and get_descendants in Prensor is that all paths in a Prensor are realized, thus all known. But an Expression's descendants might not all be known at the point this method is called, because an expression may have an infinite number of children.

RETURNS	DESCRIPTION
`Mapping[Path, Expression]`	A mapping from paths (relative to the root of the subexpression) to expressions.

Source code in struct2tensor/expression.py

def get_known_descendants(self) -> Mapping[path.Path, "Expression"]:
  # Rename get_known_descendants
  """Gets a mapping from known paths to subexpressions.

  The difference between this and get_descendants in Prensor is that
  all paths in a Prensor are realized, thus all known. But an Expression's
  descendants might not all be known at the point this method is called,
  because an expression may have an infinite number of children.

  Returns:
    A mapping from paths (relative to the root of the subexpression) to
      expressions.
  """
  known_subexpressions = {
      k: v.get_known_descendants()
      for k, v in self.get_known_children().items()
  }
  result = {}
  for field_name, subexpression in known_subexpressions.items():
    subexpression_path = path.Path(
        [field_name], validate_step_format=self.validate_step_format
    )
    for p, expr in subexpression.items():
      result[subexpression_path.concat(p)] = expr
  result[path.Path([], validate_step_format=self.validate_step_format)] = self
  return result

get_paths_with_schema ¶

get_paths_with_schema() -> List[Path]

Extract only paths that contain schema information.

Source code in struct2tensor/expression.py

def get_paths_with_schema(self) -> List[path.Path]:
  """Extract only paths that contain schema information."""
  result = []
  for name, child in self.get_known_children().items():
    if child.schema_feature is None:
      continue
    result.extend(
        [
            path.Path(
                [name], validate_step_format=self.validate_step_format
            ).concat(x)
            for x in child.get_paths_with_schema()
        ]
    )
  # Note: We always take the root path and so will return an empty schema
  # if there is no schema information on any nodes, including the root.
  if not result:
    result.append(
        path.Path([], validate_step_format=self.validate_step_format)
    )
  return result

get_schema ¶

get_schema(create_schema_features=True) -> Schema

Returns a schema for the entire tree.

PARAMETER	DESCRIPTION
`create_schema_features`	If True, schema features are added for all children and a schema entry is created if not available on the child. If False, features are left off of the returned schema if there is no schema_feature on the child. DEFAULT: `True`

Source code in struct2tensor/expression.py

def get_schema(self, create_schema_features=True) -> schema_pb2.Schema:
  """Returns a schema for the entire tree.

  Args:
    create_schema_features: If True, schema features are added for all
      children and a schema entry is created if not available on the child. If
      False, features are left off of the returned schema if there is no
      schema_feature on the child.
  """
  if not create_schema_features:
    return self.project(self.get_paths_with_schema()).get_schema()
  result = schema_pb2.Schema()
  self._populate_schema_feature_children(result.feature)
  return result

get_source_expressions ¶

get_source_expressions() -> Sequence[Expression]

Gets the sources of this expression.

The node tensors of the source expressions must be sufficient to calculate the node tensor of this expression (see calculate and calculate_value_slowly).

RETURNS	DESCRIPTION
`Sequence[Expression]`	The sources of this expression.

Source code in struct2tensor/expression_impl/size.py

def get_source_expressions(self) -> Sequence[expression.Expression]:
  return [self._origin, self._origin_parent]

known_field_names ¶

known_field_names() -> FrozenSet[Step]

Returns known field names of the expression.

TODO(martinz): implement set_field and project. Known field names of a parsed proto correspond to the fields declared in the message. Examples of "unknown" fields are extensions and explicit casts in an any field. The only way to know if an unknown field "(foo.bar)" is present in an expression expr is to call (expr["(foo.bar)"] is not None).

Notice that simply accessing a field does not make it "known". However, setting a field (or setting a descendant of a field) will make it known.

project(...) returns an expression where the known field names are the only field names. In general, if you want to depend upon known_field_names (e.g., if you want to compile a expression), then the best approach is to project() the expression first.

RETURNS	DESCRIPTION
`FrozenSet[Step]`	An immutable set of field names.

Source code in struct2tensor/expression.py

def known_field_names(self) -> FrozenSet[path.Step]:
  return frozenset()

map_field_values ¶

map_field_values(
    source_path: CoercableToPath,
    operator: Callable[[Tensor], Tensor],
    dtype: DType,
    new_field_name: Step,
) -> Expression

Map a primitive field to create a new primitive field.

Note

The dtype argument is added since the v1 API.

PARAMETER	DESCRIPTION
`source_path`	the origin path. TYPE: `CoercableToPath`
`operator`	an element-wise operator that takes a 1-dimensional vector. TYPE: `Callable[[Tensor], Tensor]`
`dtype`	the type of the output. TYPE: `DType`
`new_field_name`	the name of a new sibling of source_path. TYPE: `Step`

RETURNS	DESCRIPTION
`Expression`	the resulting root expression.

Source code in struct2tensor/expression.py

def map_field_values(self, source_path: CoercableToPath,
                     operator: Callable[[tf.Tensor], tf.Tensor],
                     dtype: tf.DType,
                     new_field_name: path.Step) -> "Expression":
  """Map a primitive field to create a new primitive field.

  !!! Note
      The dtype argument is added since the v1 API.

  Args:
    source_path: the origin path.
    operator: an element-wise operator that takes a 1-dimensional vector.
    dtype: the type of the output.
    new_field_name: the name of a new sibling of source_path.

  Returns:
    the resulting root expression.
  """
  return map_values.map_values(self, path.create_path(source_path), operator,
                               dtype, new_field_name)

map_ragged_tensors ¶

map_ragged_tensors(
    parent_path: CoercableToPath,
    source_fields: Sequence[Step],
    operator: Callable[..., SparseTensor],
    is_repeated: bool,
    dtype: DType,
    new_field_name: Step,
) -> Expression

Maps a set of primitive fields of a message to a new field.

Unlike map_field_values, this operation allows you to some degree reshape the field. For instance, you can take two optional fields and create a repeated field, or perform a reduce_sum on the last dimension of a repeated field and create an optional field. The key constraint is that the operator must return a sparse tensor of the correct dimension: i.e., a 2D sparse tensor if is_repeated is true, or a 1D sparse tensor if is_repeated is false. Moreover, the first dimension of the sparse tensor must be equal to the first dimension of the input tensor.

PARAMETER	DESCRIPTION
`parent_path`	the parent of the input and output fields. TYPE: `CoercableToPath`
`source_fields`	the nonempty list of names of the source fields. TYPE: `Sequence[Step]`
`operator`	an operator that takes len(source_fields) sparse tensors and returns a sparse tensor of the appropriate shape. TYPE: `Callable[..., SparseTensor]`
`is_repeated`	whether the output is repeated. TYPE: `bool`
`dtype`	the dtype of the result. TYPE: `DType`
`new_field_name`	the name of the resulting field. TYPE: `Step`

RETURNS	DESCRIPTION
`Expression`	A new query.

Source code in struct2tensor/expression.py

def map_ragged_tensors(self, parent_path: CoercableToPath,
                       source_fields: Sequence[path.Step],
                       operator: Callable[..., tf.SparseTensor],
                       is_repeated: bool, dtype: tf.DType,
                       new_field_name: path.Step) -> "Expression":
  """Maps a set of primitive fields of a message to a new field.

  Unlike map_field_values, this operation allows you to some degree reshape
  the field. For instance, you can take two optional fields and create a
  repeated field, or perform a reduce_sum on the last dimension of a repeated
  field and create an optional field. The key constraint is that the operator
  must return a sparse tensor of the correct dimension: i.e., a
  2D sparse tensor if is_repeated is true, or a 1D sparse tensor if
  is_repeated is false. Moreover, the first dimension of the sparse tensor
  must be equal to the first dimension of the input tensor.

  Args:
    parent_path: the parent of the input and output fields.
    source_fields: the nonempty list of names of the source fields.
    operator: an operator that takes len(source_fields) sparse tensors and
      returns a sparse tensor of the appropriate shape.
    is_repeated: whether the output is repeated.
    dtype: the dtype of the result.
    new_field_name: the name of the resulting field.

  Returns:
    A new query.
  """
  return map_prensor.map_ragged_tensor(
      self,
      path.create_path(parent_path),
      [
          path.Path([f], validate_step_format=self.validate_step_format)
          for f in source_fields
      ],
      operator,
      is_repeated,
      dtype,
      new_field_name,
  )

map_sparse_tensors ¶

map_sparse_tensors(
    parent_path: CoercableToPath,
    source_fields: Sequence[Step],
    operator: Callable[..., SparseTensor],
    is_repeated: bool,
    dtype: DType,
    new_field_name: Step,
) -> Expression

Maps a set of primitive fields of a message to a new field.

Unlike map_field_values, this operation allows you to some degree reshape the field. For instance, you can take two optional fields and create a repeated field, or perform a reduce_sum on the last dimension of a repeated field and create an optional field. The key constraint is that the operator must return a sparse tensor of the correct dimension: i.e., a 2D sparse tensor if is_repeated is true, or a 1D sparse tensor if is_repeated is false. Moreover, the first dimension of the sparse tensor must be equal to the first dimension of the input tensor.

PARAMETER	DESCRIPTION
`parent_path`	the parent of the input and output fields. TYPE: `CoercableToPath`
`source_fields`	the nonempty list of names of the source fields. TYPE: `Sequence[Step]`
`operator`	an operator that takes len(source_fields) sparse tensors and returns a sparse tensor of the appropriate shape. TYPE: `Callable[..., SparseTensor]`
`is_repeated`	whether the output is repeated. TYPE: `bool`
`dtype`	the dtype of the result. TYPE: `DType`
`new_field_name`	the name of the resulting field. TYPE: `Step`

RETURNS	DESCRIPTION
`Expression`	A new query.

Source code in struct2tensor/expression.py

def map_sparse_tensors(self, parent_path: CoercableToPath,
                       source_fields: Sequence[path.Step],
                       operator: Callable[..., tf.SparseTensor],
                       is_repeated: bool, dtype: tf.DType,
                       new_field_name: path.Step) -> "Expression":
  """Maps a set of primitive fields of a message to a new field.

  Unlike map_field_values, this operation allows you to some degree reshape
  the field. For instance, you can take two optional fields and create a
  repeated field, or perform a reduce_sum on the last dimension of a repeated
  field and create an optional field. The key constraint is that the operator
  must return a sparse tensor of the correct dimension: i.e., a
  2D sparse tensor if is_repeated is true, or a 1D sparse tensor if
  is_repeated is false. Moreover, the first dimension of the sparse tensor
  must be equal to the first dimension of the input tensor.

  Args:
    parent_path: the parent of the input and output fields.
    source_fields: the nonempty list of names of the source fields.
    operator: an operator that takes len(source_fields) sparse tensors and
      returns a sparse tensor of the appropriate shape.
    is_repeated: whether the output is repeated.
    dtype: the dtype of the result.
    new_field_name: the name of the resulting field.

  Returns:
    A new query.
  """
  return map_prensor.map_sparse_tensor(
      self,
      path.create_path(parent_path),
      [
          path.Path([f], validate_step_format=self.validate_step_format)
          for f in source_fields
      ],
      operator,
      is_repeated,
      dtype,
      new_field_name,
  )

project ¶

project(path_list: Sequence[CoercableToPath]) -> Expression

Constrains the paths to those listed.

Source code in struct2tensor/expression.py

def project(self, path_list: Sequence[CoercableToPath]) -> "Expression":
  """Constrains the paths to those listed."""
  return project.project(self, [path.create_path(x) for x in path_list])

promote ¶

promote(source_path: CoercableToPath, new_field_name: Step)

Promotes source_path to be a field new_field_name in its grandparent.

Source code in struct2tensor/expression.py

def promote(self, source_path: CoercableToPath, new_field_name: path.Step):
  """Promotes source_path to be a field new_field_name in its grandparent."""
  return promote.promote(self, path.create_path(source_path), new_field_name)

promote_and_broadcast ¶

promote_and_broadcast(
    path_dictionary: Mapping[Step, CoercableToPath],
    dest_path_parent: CoercableToPath,
) -> Expression

Source code in struct2tensor/expression.py

def promote_and_broadcast(
    self, path_dictionary: Mapping[path.Step, CoercableToPath],
    dest_path_parent: CoercableToPath) -> "Expression":
  return promote_and_broadcast.promote_and_broadcast(
      self, {k: path.create_path(v) for k, v in path_dictionary.items()},
      path.create_path(dest_path_parent))

reroot ¶

reroot(new_root: CoercableToPath) -> Expression

Returns a new list of protocol buffers available at new_root.

Source code in struct2tensor/expression.py

def reroot(self, new_root: CoercableToPath) -> "Expression":
  """Returns a new list of protocol buffers available at new_root."""
  return reroot.reroot(self, path.create_path(new_root))

schema_string ¶

schema_string(limit: Optional[int] = None) -> str

Returns a schema for the expression.

For examle,

repeated root:
  optional int32 foo
  optional bar:
    optional string baz
  optional int64 bak

Note that unknown fields and subexpressions are not displayed.

PARAMETER	DESCRIPTION
`limit`	if present, limit the recursion. TYPE: `Optional[int]` DEFAULT: `None`

RETURNS	DESCRIPTION
`str`	A string, describing (a part of) the schema.

Source code in struct2tensor/expression.py

def schema_string(self, limit: Optional[int] = None) -> str:
  """Returns a schema for the expression.

  For examle,
  ```
  repeated root:
    optional int32 foo
    optional bar:
      optional string baz
    optional int64 bak
  ```

  Note that unknown fields and subexpressions are not displayed.

  Args:
    limit: if present, limit the recursion.

  Returns:
    A string, describing (a part of) the schema.
  """
  return "\n".join(self._schema_string_helper("root", limit))

slice ¶

slice(
    source_path: CoercableToPath,
    new_field_name: Step,
    begin: Optional[IndexValue] = None,
    end: Optional[IndexValue] = None,
) -> Expression

Creates a slice copy of source_path at new_field_path.

Note that if begin or end is negative, it is considered relative to the size of the array. e.g., slice(...,begin=-1) will get the last element of every array.

PARAMETER	DESCRIPTION
`source_path`	the source of the slice. TYPE: `CoercableToPath`
`new_field_name`	the new field that is generated. TYPE: `Step`
`begin`	the beginning of the slice (inclusive). TYPE: `Optional[IndexValue]` DEFAULT: `None`
`end`	the end of the slice (exclusive). TYPE: `Optional[IndexValue]` DEFAULT: `None`

RETURNS	DESCRIPTION
`Expression`	An Expression object representing the result of the operation.

Source code in struct2tensor/expression.py

def slice(self,
          source_path: CoercableToPath,
          new_field_name: path.Step,
          begin: Optional[IndexValue] = None,
          end: Optional[IndexValue] = None) -> "Expression":
  """Creates a slice copy of source_path at new_field_path.

  Note that if begin or end is negative, it is considered relative to
  the size of the array. e.g., slice(...,begin=-1) will get the last
  element of every array.

  Args:
    source_path: the source of the slice.
    new_field_name: the new field that is generated.
    begin: the beginning of the slice (inclusive).
    end: the end of the slice (exclusive).

  Returns:
    An Expression object representing the result of the operation.
  """
  return slice_expression.slice_expression(self,
                                           path.create_path(source_path),
                                           new_field_name, begin, end)

truncate ¶

truncate(
    source_path: CoercableToPath,
    limit: Union[int, Tensor],
    new_field_name: Step,
) -> Expression

Creates a truncated copy of source_path at new_field_path.

Source code in struct2tensor/expression.py

def truncate(self, source_path: CoercableToPath, limit: Union[int, tf.Tensor],
             new_field_name: path.Step) -> "Expression":
  """Creates a truncated copy of source_path at new_field_path."""
  return self.slice(source_path, new_field_name, end=limit)

Functions¶

has ¶

has(
    root: Expression,
    source_path: Path,
    new_field_name: Step,
) -> Expression

Get the has of a field as a new sibling field.

PARAMETER	DESCRIPTION
`root`	the original expression. TYPE: `Expression`
`source_path`	the source path to measure. Cannot be root. TYPE: `Path`
`new_field_name`	the name of the sibling field. TYPE: `Step`

RETURNS	DESCRIPTION
`Expression`	The new expression.

Source code in struct2tensor/expression_impl/size.py

def has(root: expression.Expression, source_path: path.Path,
        new_field_name: path.Step) -> expression.Expression:
  """Get the has of a field as a new sibling field.

  Args:
    root: the original expression.
    source_path: the source path to measure. Cannot be root.
    new_field_name: the name of the sibling field.

  Returns:
    The new expression.
  """
  new_root, size_p = size_anonymous(root, source_path)
  # TODO(martinz): consider using copy_over to "remove" the size field
  # from the result.
  return map_values.map_values(
      new_root, size_p, lambda x: tf.greater(x, tf.constant(0, dtype=tf.int64)),
      tf.bool, new_field_name)

size ¶

size(
    root: Expression,
    source_path: Path,
    new_field_name: Step,
) -> Expression

Get the size of a field as a new sibling field.

PARAMETER	DESCRIPTION
`root`	the original expression. TYPE: `Expression`
`source_path`	the source path to measure. Cannot be root. TYPE: `Path`
`new_field_name`	the name of the sibling field. TYPE: `Step`

RETURNS	DESCRIPTION
`Expression`	The new expression.

Source code in struct2tensor/expression_impl/size.py

def size(root: expression.Expression, source_path: path.Path,
         new_field_name: path.Step) -> expression.Expression:
  """Get the size of a field as a new sibling field.

  Args:
    root: the original expression.
    source_path: the source path to measure. Cannot be root.
    new_field_name: the name of the sibling field.

  Returns:
    The new expression.
  """
  return _size_impl(root, source_path, new_field_name)[0]

size_anonymous ¶

size_anonymous(
    root: Expression, source_path: Path
) -> Tuple[Expression, Path]

Calculate the size of a field, and store it as an anonymous sibling.

PARAMETER	DESCRIPTION
`root`	the original expression. TYPE: `Expression`
`source_path`	the source path to measure. Cannot be root. TYPE: `Path`

RETURNS	DESCRIPTION
`Tuple[Expression, Path]`	The new expression and the new field as a pair.

Source code in struct2tensor/expression_impl/size.py

def size_anonymous(root: expression.Expression, source_path: path.Path
                  ) -> Tuple[expression.Expression, path.Path]:
  """Calculate the size of a field, and store it as an anonymous sibling.

  Args:
    root: the original expression.
    source_path: the source path to measure. Cannot be root.

  Returns:
    The new expression and the new field as a pair.
  """
  return _size_impl(root, source_path, path.get_anonymous_field())

Modules¶

slice_expression ¶

Implementation of slice.

The slice operation is meant to replicate the slicing of a list in python.

Slicing a list in python is done by specifying a beginning and ending. The resulting list consists of all elements in the range.

For example:

>>> x = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
>>> print(x[2:5]) # all elements between index 2 inclusive and index 5 exclusive
['c', 'd', 'e']
>>> print(x[2:]) # all elements between index 2 and the end.
['c', 'd', 'e', 'f', 'g']
>>> print(x[:4]) # all elements between the beginning and index 4 (exclusive).
['a', 'b', 'c', 'd']
>>> print(x[-3:-1]) # all elements starting three from the end.
>>>                 # until one from the end (exclusive).
['e', 'f']
>>> print(x[-3:6]) # all elements starting three from the end
                   # until index 6 exclusive.
['e', 'f', 'g']

TODO(martinz): there is a third argument to slice, which allows one to step over the elements (e.g. x[2:6:2]=['c', 'e'], giving you every other element. This is not implemented.

A prensor can be considered to be interleaved lists and dictionaries. E.g.:

my_expression = [{
  "foo":[
    {"bar":[
      {"baz":["a","b","c", "d"]},
      {"baz":["d","e","f"]}
      ]
    },
    {"bar":[
      {"baz":["g","h","i"]},
      {"baz":["j","k","l", ]}
      {"baz":["m"]}
    ]
    }]
}]

result_1 = slice_expression.slice_expression(
  my_expression, "foo.bar", "new_bar",begin=1, end=3)

result_1 = [{
  "foo":[
    {"bar":[
      {"baz":["a","b","c", "d"]},
      {"baz":["d","e","f"]}
      ],
     "new_bar":[
      {"baz":["d","e","f"]}
      ]
    },
    {"bar":[
      {"baz":["g","h","i"]},
      {"baz":["j","k","l", ]}
      {"baz":["m", ]}
     ],
    "new_bar":[
      {"baz":["j","k","l", ]}
      {"baz":["m", ]}
    ]
    }]
}]

result_2 = slice_expression.slice_expression(
  my_expression, "foo.bar.baz", "new_baz",begin=1, end=3)

result_2 = [{
  "foo":[
    {"bar":[
      {"baz":["a","b","c", "d"],
       "new_baz":["b","c"],
      },
      {"baz":["d","e","f"], "new_baz":["e","f"]}
      ]
    },
    {"bar":[
      {"baz":["g","h","i"], "new_baz":["h","i"]},
      {"baz":["j","k","l"], "new_baz":["k","l"]},
      {"baz":["m", ]}
      ]
    }]
}]

FUNCTION	DESCRIPTION
`slice_expression`	Creates a new subtree with a sliced expression.

ATTRIBUTE	DESCRIPTION
`IndexValue`

Attributes¶

IndexValue `module-attribute` ¶

IndexValue = IndexValue

Functions¶

slice_expression ¶

slice_expression(
    expr: Expression,
    p: Path,
    new_field_name: Step,
    begin: Optional[IndexValue],
    end: Optional[IndexValue],
) -> Expression

Creates a new subtree with a sliced expression.

This follows the pattern of python slice() method. See module-level comments for examples.

PARAMETER	DESCRIPTION
`expr`	the original root expression TYPE: `Expression`
`p`	the path to the source to be sliced. TYPE: `Path`
`new_field_name`	the name of the new subtree. TYPE: `Step`
`begin`	beginning index TYPE: `Optional[IndexValue]`
`end`	end index. TYPE: `Optional[IndexValue]`

RETURNS	DESCRIPTION
`Expression`	A new root expression.

Source code in struct2tensor/expression_impl/slice_expression.py

def slice_expression(expr: expression.Expression, p: path.Path,
                     new_field_name: path.Step, begin: Optional[IndexValue],
                     end: Optional[IndexValue]) -> expression.Expression:
  """Creates a new subtree with a sliced expression.

  This follows the pattern of python slice() method.
  See module-level comments for examples.

  Args:
    expr: the original root expression
    p: the path to the source to be sliced.
    new_field_name: the name of the new subtree.
    begin: beginning index
    end: end index.

  Returns:
    A new root expression.
  """
  work_expr, mask_anonymous_path = _get_slice_mask(expr, p, begin, end)
  work_expr = filter_expression.filter_by_sibling(
      work_expr, p, mask_anonymous_path.field_list[-1], new_field_name)
  new_path = p.get_parent().get_child(new_field_name)
  # We created a lot of anonymous fields and intermediate expressions. Just grab
  # the final result (and its children).
  return expression_add.add_to(expr, {new_path: work_expr})

expression_impl¶

struct2tensor.expression_impl ¶

Modules¶

apply_schema ¶

Functions¶

apply_schema ¶

Modules¶

broadcast ¶

Functions¶

broadcast ¶

broadcast_anonymous ¶

Modules¶

depth_limit ¶

Functions¶

limit_depth ¶

Modules¶

filter_expression ¶

Functions¶

filter_by_child ¶

filter_by_sibling ¶

Modules¶

index ¶

Functions¶

get_index_from_end ¶

get_positional_index ¶

Modules¶

map_prensor ¶

Functions¶

map_ragged_tensor ¶

map_sparse_tensor ¶

Modules¶

map_prensor_to_prensor ¶

Classes¶

Schema ¶

Functions¶

create_schema ¶

map_prensor_to_prensor ¶

Modules¶

map_values ¶

Functions¶

map_many_values ¶

map_values ¶

map_values_anonymous ¶

Modules¶

parquet ¶

Classes¶

ParquetDataset ¶

Functions¶

calculate_parquet_values ¶

create_expression_from_parquet_file ¶

Modules¶

parse_message_level_ex ¶

Attributes¶

ProtoFieldName module-attribute ¶

ProtoFullName module-attribute ¶

StrStep module-attribute ¶

Functions¶

get_full_name_from_any_step ¶

is_any_descriptor ¶

parse_message_level_ex ¶

Modules¶

placeholder ¶

Functions¶

create_expression_from_schema ¶

get_placeholder_paths_from_graph ¶

Modules¶

project ¶

Functions¶

project ¶

Modules¶

promote ¶

Classes¶

PromoteChildExpression ¶

PromoteExpression ¶

Functions¶

promote ¶

promote_anonymous ¶

Modules¶

promote_and_broadcast ¶

Functions¶

`expression_impl`¶

ProtoFieldName `module-attribute` ¶

ProtoFullName `module-attribute` ¶

StrStep `module-attribute` ¶

ProtoExpression `module-attribute` ¶

ProtoFieldName `module-attribute` ¶

ProtoFullName `module-attribute` ¶

StrStep `module-attribute` ¶

TransformFn `module-attribute` ¶

IndexValue `module-attribute` ¶