diff options
author | Oliver Reiche <oliver.reiche@huawei.com> | 2023-06-01 13:36:32 +0200 |
---|---|---|
committer | Oliver Reiche <oliver.reiche@huawei.com> | 2023-06-12 16:29:05 +0200 |
commit | b66a7359fbbff35af630c88c56598bbc06b393e1 (patch) | |
tree | d866802c4b44c13cbd90f9919cc7fc472091be0c /doc/concepts/anonymous-targets.md | |
parent | 144b2c619f28c91663936cd445251ca28af45f88 (diff) | |
download | justbuild-b66a7359fbbff35af630c88c56598bbc06b393e1.tar.gz |
doc: Convert orgmode files to markdown
Diffstat (limited to 'doc/concepts/anonymous-targets.md')
-rw-r--r-- | doc/concepts/anonymous-targets.md | 345 |
1 files changed, 345 insertions, 0 deletions
diff --git a/doc/concepts/anonymous-targets.md b/doc/concepts/anonymous-targets.md new file mode 100644 index 00000000..6692d0ae --- /dev/null +++ b/doc/concepts/anonymous-targets.md @@ -0,0 +1,345 @@ +Anonymous targets +================= + +Motivation +---------- + +Using [Protocol buffers](https://github.com/protocolbuffers/protobuf) +allows to specify, in a language-independent way, a wire format for +structured data. This is done by using description files from which APIs +for various languages can be generated. As protocol buffers can contain +other protocol buffers, the description files themselves have a +dependency structure. + +From a software-engineering point of view, the challenge is to ensure +that the author of the description files does not have to be aware of +the languages for which APIs will be generated later. In fact, the main +benefit of the language-independent description is that clients in +various languages can be implemented using the same wire protocol (and +thus capable of communicating with the same server). + +For a build system that means that we have to expect that language +bindings at places far away from the protocol definition, and +potentially several times. Such a duplication can also occur implicitly +if two buffers, for which language bindings are generated both use a +common buffer for which bindings are never requested explicitly. Still, +we want to avoid duplicate work for common parts and we have to avoid +conflicts with duplicate symbols and staging conflicts for the libraries +for the common part. + +Our approach is that a "proto" target only provides the description +files together with their dependency structure. From those, a consuming +target generates "anonymous targets" as additional dependencies; as +those targets will have an appropriate notion of equality, no duplicate +work is done and hence, as a side effect, staging or symbol conflicts +are avoided as well. + +Preliminary remark: action identifiers +-------------------------------------- + +Actions are defined as Merkle-tree hash of the contents. As all +components (input tree, list of output strings, command vector, +environment, and cache pragma) are given by expressions, that can +quickly be computed. This identifier also defines the notion of equality +for actions, and hence action artifacts. Recall that equality of +artifacts is also (implicitly) used in our notion of disjoint map union +(where the set of keys does not have to be disjoint, as long as the +values for all duplicate keys are equal). + +When constructing the action graph for traversal, we can drop duplicates +(i.e., actions with the same identifier, and hence the same +description). For the serialization of the graph as part of the analyse +command, we can afford the preparatory step to compute a map from action +id to list of origins. + +Equality +-------- + +### Notions of equality + +In the context of builds, there are different concepts of equality to +consider. We recall the definitions, as well as their use in our build +tool. + +#### Locational equality ("Defined at the same place") + +Names (for targets and rules) are given by repository name, module +name, and target name (inside the module); additionally, for target +names, there's a bit specifying that we explicitly refer to a file. +Names are equal if and only if the respective strings (and the file +bit) are equal. + +For targets, we use locational equality, i.e., we consider targets +equal precisely if their names are equal; targets defined at +different places are considered different, even if they're defined +in the same way. The reason we use notion of equality is that we +have to refer to targets (and also check if we already have a +pending task to analyse them) before we have fully explored them +with all the targets referred to in their definition. + +#### Intensional equality ("Defined in the same way") + +In our expression language we handle definitions; in particular, we +treat artifacts by their definition: a particular source file, the +output of a particular action, etc. Hence we use intensional +equality in our expression language; two objects are equal precisely +if they are defined in the same way. This notion of equality is easy +to determine without the need of reading a source file or running an +action. We implement quick tests by keeping a Merkle-tree hash of +all expression values. + +#### Extensional equality ("Defining the same object") + +For built artifacts, we use extensional equality, i.e., we consider +two files equal, if they are bit-by-bit identical. +Implementation-wise, we compare an appropriate cryptographic hash. +Before running an action, we built its inputs. In particular (as +inputs are considered extensionally) an action might cause a cache +hit with an intensionally different one. + +#### Observable equality ("The defined objects behave in the same way") + +Finally, there is the notion of observable equality, i.e., the +property that two binaries behaving the same way in all situations. +As this notion is undecidable, it is never used directly by any +build tool. However, it is often the motivation for a build in the +first place: we want a binary that behaves in a particular way. + +### Relation between these notions + +The notions of equality were introduced in order from most fine grained +to most coarse. Targets defined at the same place are obviously defined +in the same way. Intensionally equal artifacts create equal action +graphs; here we can confidently say "equal" and not only isomorphic: +due to our preliminary clean up, even the node names are equal. Making +sure that equal actions produce bit-by-bit equal outputs is the realm of +[reproducibe builds](https://reproducible-builds.org/). The tool can +support this by appropriate sandboxing, etc, but the rules still have to +define actions that don't pick up non-input information like the +current time, user id, readdir order, etc. Files that are bit-by-bit +identical will behave in the same way. + +### Example + +Consider the following target file. + +```jsonc +{ "foo": + { "type": "generic" + , "outs": ["out.txt"] + , "cmds": ["echo Hello World > out.txt"] + } +, "bar": + { "type": "generic" + , "outs": ["out.txt"] + , "cmds": ["echo Hello World > out.txt"] + } +, "baz": + { "type": "generic" + , "outs": ["out.txt"] + , "cmds": ["echo -n Hello > out.txt && echo ' World' >> out.txt"] + } +, "foo upper": + { "type": "generic" + , "deps": ["foo"] + , "outs": ["upper.txt"] + , "cmds": ["cat out.txt | tr a-z A-Z > upper.txt"] + } +, "bar upper": + { "type": "generic" + , "deps": ["bar"] + , "outs": ["upper.txt"] + , "cmds": ["cat out.txt | tr a-z A-Z > upper.txt"] + } +, "baz upper": + { "type": "generic" + , "deps": ["baz"] + , "outs": ["upper.txt"] + , "cmds": ["cat out.txt | tr a-z A-Z > upper.txt"] + } +, "ALL": + { "type": "install" + , "files": + {"foo.txt": "foo upper", "bar.txt": "bar upper", "baz.txt": "baz upper"} + } +} +``` + +Assume we build the target `"ALL"`. Then we will analyse 7 targets, all +the locationally different ones (`"foo"`, `"bar"`, `"baz"`, +`"foo upper"`, `"bar upper"`, `"baz upper"`). For the targets `"foo"` +and `"bar"`, we immediately see that the definition is equal; their +intensional equality also renders `"foo upper"` and `"bar upper"` +intensionally equal. Our action graph will contain 4 actions: one with +origins `["foo", "bar"]`, one with origins `["baz"]`, one with origins +`["foo upper", "bar upper"]`, and one with origins `["baz +upper"]`. The `"install"` target will, of course, not create any +actions. Building sequentially (`-J 1`), we will get one cache hit. Even +though the artifacts of `"foo"` and `"bar"` and of `"baz"` are defined +differently, they are extensionally equal; both define a file with +contents `"Hello World\n"`. + +Anonymous targets +----------------- + +Besides named targets we also have additional targets (and hence also +configured targets) that are not associated with a location they are +defined at. Due to the absence of definition location, their notion of +equality will take care of the necessary deduplication (implicitly, by +the way our dependency exploration works). We will call them "anonymous +targets", even though, technically, they're not fully anonymous as the +rules that are part of their structure will be given by name, i.e., +defining rule location. + +### Value type: target graph node + +In order to allow targets to adequately describe a dependency structure, +we have a value type in our expression language, that of a (target) +graph node. As with all value types, equality is intensional, i.e., +nodes defined in the same way are equal even if defined at different +places. This can be achieved by our usual approach for expressions of +having cached Merkle-tree hashes and comparing them when an equality +test is required. This efficient test for equality also allows using +graph nodes as part of a map key, e.g., for our asynchronous map +consumers. + +As a graph node can only be defined with all data given, the defined +dependency structure is cycle-free by construction. However, the tree +unfolding will usually be exponentially larger. For internal handling, +this is not a problem: our shared-pointer implementation can efficiently +represent a directed acyclic graph and since we cache hashes in +expressions, we can compute the overall hash without folding the +structure to a tree. When presenting nodes to the user, we only show the +map of identifier to definition, to avoid that exponential unfolding. + +We have two kinds of nodes. + +#### Value nodes + +These represent a target that, in any configuration, returns a fixed +value. Source files would typically be represented this way. The +constructor function `"VALUE_NODE"` takes a single argument `"$1"` +that has to be a result value. + +#### Abstract nodes + +These represent internal nodes in the dag. Their constructor +`"ABSTRACT_NODE"` takes the following arguments (all evaluated). + + - `"node_type"`. An arbitrary string, not interpreted in any way, + to indicate the role that the node has in the dependency + structure. When we create an anonymous target from a node, this + will serve as the key into the rule mapping to be applied. + - `"string_fields"`. This has to be a map of strings. + - `"target_fields"`. These have to be a map of lists of graph + nodes. + +Moreover, we require that the keys for maps provided as +`"string_fields"` and `"target_fields"` be disjoint. + +### Graph nodes in `export` targets + +Graph nodes are completely free of names and hence are eligible for +exporting. As with other values, in the cache the intensional definition +of artifacts implicit in them will be replaced by the corresponding, +extensionally equal, known value. + +However, some care has to be taken in the serialisation that is part of +the caching, as we do not want to unfold the dag to a tree. Therefore, +we take as JSON serialisation a simple dict with `"type"` set to +`"NODE"`, and `"value"` set to the Merkle-tree hash. That serialisation +respects intensional equality. To allow deserialisation, we add an +additional map to the serialisation from node hash to its definition. + +### Dependings on anonymous targets + +#### Parts of an anonymous target + +An anonymous target is given by a pair of a node and a map mapping +the abstract node-type specifying strings to rule names. So, in the +implementation these are just two expression pointers (with their +defined notion of equality, i.e., equality of the respective +Merkle-tree hashes). Such a pair of pointers also forms an +additional variant of a name value, referring to such an anonymous +target. + +It should be noted that such an anonymous target contains all the +information needed to evaluate it in the same way as a regular +(named) target defined by a user-defined rule. It is an analysis +error analysing an anonymous target where there is no entry in the +rules map for the string given as `"node_type"` for the +corresponding node. + +#### Anonymous targets as additional dependencies + +We keep the property that a user can only request named targets. So +anonymous targets have to be requested by other targets. We also +keep the property that other targets are only requested at certain +fixed steps in the evaluation of a target. To still achieve a +meaningful use of anonymous targets our rule language handles +anonymous targets in the following way. + +##### Rules parameter `"anonymous"` + +In the rule definition a parameter `"anonymous"` (with empty map +as default) is allowed. It is used to define an additional +dependency on anonymous targets. The value has to be a map with +keys the additional implicitly defined field names. It is hence +a requirement that the set of keys be disjoint from all other +field names (the values of `"config_fields"`, `"string_fields"`, +and `"target_fields"`, as well as the keys of the `"implict"` +parameter). Another consequence is that `"config_transitions"` +map may now also have meaningful entries for the keys of the +`"anonymous"` map. Each value in the map has to be itself a map, +with entries `"target"`, `"provider"`, and `"rule_map"`. + +For `"target"`, a single string has to be specifed, and the +value has to be a member of the `"target_fields"` list. For +provider, a single string has to be specified as well. The idea +is that the nodes are collected from that provider of the +targets in the specified target field. For `"rule_map"` a map +has to be specified from strings to rule names; the latter are +evaluated in the context of the rule definition. + +###### Example + +For generating language bindings for protocol buffers, a +rule might look as follows. + +``` jsonc +{ "cc_proto_bindings": + { "target_fields": ["proto_deps"] + , "anonymous": + { "protos": + { "target": "proto_deps" + , "provider": "proto" + , "rule_map": {"proto_library": "cc_proto_library"} + } + } + , "expression": {...} + } +} +``` + +##### Evaluation mechanism + +The evaluation of a target defined by a user-defined rule is +handled as follows. After the target fields are evaluated as +usual, an additional step is carried out. + +For each anonymous-target field, i.e., for each key in the +`"anonymous"` map, a list of anonymous targets is generated from +the corresponding value: take all targets from the specified +`"target"` field in all their specified configuration +transitions (they have already been evaluated) and take the +values provided for the specified `"provider"` key (using the +empty list as default). That value has to be a list of nodes. +All the node lists obtained that way are concatenated. The +configuration transition for the respective field name is +evaluated. Those targets are then evaluated for all the +transitioned configurations requested. + +In the final evaluation of the defining expression, the +anonymous-target fields are available in the same way as any +other target field. Also, they contribute to the effective +configuration in the same way as regular target fields. |