summaryrefslogtreecommitdiff
path: root/doc/concepts/anonymous-targets.org
diff options
context:
space:
mode:
Diffstat (limited to 'doc/concepts/anonymous-targets.org')
-rw-r--r--doc/concepts/anonymous-targets.org336
1 files changed, 0 insertions, 336 deletions
diff --git a/doc/concepts/anonymous-targets.org b/doc/concepts/anonymous-targets.org
deleted file mode 100644
index 98d194c7..00000000
--- a/doc/concepts/anonymous-targets.org
+++ /dev/null
@@ -1,336 +0,0 @@
-* Anonymous targets
-** Motivation
-
-Using [[https://github.com/protocolbuffers/protobuf][Protocol
-buffers]] allows to specify, in a language-independent way, a wire
-format for structured data. This is done by using description files
-from which APIs for various languages can be generated. As protocol
-buffers can contain other protocol buffers, the description files
-themselves have a dependency structure.
-
-From a software-engineering point of view, the challenge is to
-ensure that the author of the description files does not have to
-be aware of the languages for which APIs will be generated later.
-In fact, the main benefit of the language-independent description
-is that clients in various languages can be implemented using the
-same wire protocol (and thus capable of communicating with the
-same server).
-
-For a build system that means that we have to expect that language
-bindings at places far away from the protocol definition, and
-potentially several times. Such a duplication can also occur
-implicitly if two buffers, for which language bindings are generated
-both use a common buffer for which bindings are never requested
-explicitly. Still, we want to avoid duplicate work for common parts
-and we have to avoid conflicts with duplicate symbols and staging
-conflicts for the libraries for the common part.
-
-Our approach is that a "proto" target only provides the description
-files together with their dependency structure. From those, a
-consuming target generates "anonymous targets" as additional
-dependencies; as those targets will have an appropriate notion of
-equality, no duplicate work is done and hence, as a side effect,
-staging or symbol conflicts are avoided as well.
-
-** Preliminary remark: action identifiers
-
-Actions are defined as Merkle-tree hash of the contents. As all
-components (input tree, list of output strings, command vector,
-environment, and cache pragma) are given by expressions, that can
-quickly be computed. This identifier also defines the notion of
-equality for actions, and hence action artifacts. Recall that equality
-of artifacts is also (implicitly) used in our notion of disjoint
-map union (where the set of keys does not have to be disjoint, as
-long as the values for all duplicate keys are equal).
-
-When constructing the action graph for traversal, we can drop
-duplicates (i.e., actions with the same identifier, and hence the
-same description). For the serialization of the graph as part of
-the analyse command, we can afford the preparatory step to compute
-a map from action id to list of origins.
-
-** Equality
-
-*** Notions of equality
-
-In the context of builds, there are different concepts of equality
-to consider. We recall the definitions, as well as their use in
-our build tool.
-
-**** Locational equality ("Defined at the same place")
-
-Names (for targets and rules) are given by repository name, module
-name, and target name (inside the module); additionally, for target
-names, there's a bit specifying that we explicitly refer to a file.
-Names are equal if and only if the respective strings (and the file
-bit) are equal.
-
-For targets, we use locational equality, i.e., we consider targets
-equal precisely if their names are equal; targets defined at different
-places are considered different, even if they're defined in the
-same way. The reason we use notion of equality is that we have to
-refer to targets (and also check if we already have a pending task
-to analyse them) before we have fully explored them with all the
-targets referred to in their definition.
-
-**** Intensional equality ("Defined in the same way")
-
-In our expression language we handle definitions; in particular,
-we treat artifacts by their definition: a particular source file,
-the output of a particular action, etc. Hence we use intensional
-equality in our expression language; two objects are equal precisely
-if they are defined in the same way. This notion of equality is easy
-to determine without the need of reading a source file or running
-an action. We implement quick tests by keeping a Merkle-tree hash
-of all expression values.
-
-**** Extensional equality ("Defining the same object")
-
-For built artifacts, we use extensional equality, i.e., we consider
-two files equal, if they are bit-by-bit identical. Implementation-wise,
-we compare an appropriate cryptographic hash. Before running an
-action, we built its inputs. In particular (as inputs are considered
-extensionally) an action might cause a cache hit with an intensionally
-different one.
-
-**** Observable equality ("The defined objects behave in the same way")
-
-Finally, there is the notion of observable equality, i.e., the
-property that two binaries behaving the same way in all situations.
-As this notion is undecidable, it is never used directly by any
-build tool. However, it is often the motivation for a build in the
-first place: we want a binary that behaves in a particular way.
-
-*** Relation between these notions
-
-The notions of equality were introduced in order from most fine grained
-to most coarse. Targets defined at the same place are obviously defined
-in the same way. Intensionally equal artifacts create equal action
-graphs; here we can confidently say "equal" and not only isomorphic:
-due to our preliminary clean up, even the node names are equal.
-Making sure that equal actions produce bit-by-bit equal outputs
-is the realm of [[https://reproducible-builds.org/][reproducibe
-builds]]. The tool can support this by appropriate sandboxing,
-etc, but the rules still have to define actions that don't pick
-up non-input information like the current time, user id, readdir
-order, etc. Files that are bit-by-bit identical will behave in
-the same way.
-
-*** Example
-
-Consider the following target file.
-
-#+BEGIN_SRC
-{ "foo":
- { "type": "generic"
- , "outs": ["out.txt"]
- , "cmds": ["echo Hello World > out.txt"]
- }
-, "bar":
- { "type": "generic"
- , "outs": ["out.txt"]
- , "cmds": ["echo Hello World > out.txt"]
- }
-, "baz":
- { "type": "generic"
- , "outs": ["out.txt"]
- , "cmds": ["echo -n Hello > out.txt && echo ' World' >> out.txt"]
- }
-, "foo upper":
- { "type": "generic"
- , "deps": ["foo"]
- , "outs": ["upper.txt"]
- , "cmds": ["cat out.txt | tr a-z A-Z > upper.txt"]
- }
-, "bar upper":
- { "type": "generic"
- , "deps": ["bar"]
- , "outs": ["upper.txt"]
- , "cmds": ["cat out.txt | tr a-z A-Z > upper.txt"]
- }
-, "baz upper":
- { "type": "generic"
- , "deps": ["baz"]
- , "outs": ["upper.txt"]
- , "cmds": ["cat out.txt | tr a-z A-Z > upper.txt"]
- }
-, "ALL":
- { "type": "install"
- , "files":
- {"foo.txt": "foo upper", "bar.txt": "bar upper", "baz.txt": "baz upper"}
- }
-}
-#+END_SRC
-
-Assume we build the target ~"ALL"~. Then we will analyse 7 targets,
-all the locationally different ones (~"foo"~, ~"bar"~, ~"baz"~,
-~"foo upper"~, ~"bar upper"~, ~"baz upper"~). For the targets ~"foo"~
-and ~"bar"~, we immediately see that the definition is equal; their
-intensional equality also renders ~"foo upper"~ and ~"bar upper"~
-intensionally equal. Our action graph will contain 4 actions: one
-with origins ~["foo", "bar"]~, one with origins ~["baz"]~, one with
-origins ~["foo upper", "bar upper"]~, and one with origins ~["baz
-upper"]~. The ~"install"~ target will, of course, not create any
-actions. Building sequentially (~-J 1~), we will get one cache hit.
-Even though the artifacts of ~"foo"~ and ~"bar"~ and of ~"baz~"
-are defined differently, they are extensionally equal; both define
-a file with contents ~"Hello World\n"~.
-
-** Anonymous targets
-
-Besides named targets we also have additional targets (and hence also
-configured targets) that are not associated with a location they are
-defined at. Due to the absence of definition location, their notion
-of equality will take care of the necessary deduplication (implicitly,
-by the way our dependency exploration works). We will call them
-"anonymous targets", even though, technically, they're not fully
-anonymous as the rules that are part of their structure will be
-given by name, i.e., defining rule location.
-
-*** Value type: target graph node
-
-In order to allow targets to adequately describe a dependency
-structure, we have a value type in our expression language, that
-of a (target) graph node. As with all value types, equality is
-intensional, i.e., nodes defined in the same way are equal even
-if defined at different places. This can be achieved by our usual
-approach for expressions of having cached Merkle-tree hashes and
-comparing them when an equality test is required. This efficient
-test for equality also allows using graph nodes as part of a map
-key, e.g., for our asynchronous map consumers.
-
-As a graph node can only be defined with all data given, the defined
-dependency structure is cycle-free by construction. However, the
-tree unfolding will usually be exponentially larger. For internal
-handling, this is not a problem: our shared-pointer implementation
-can efficiently represent a directed acyclic graph and since we
-cache hashes in expressions, we can compute the overall hash without
-folding the structure to a tree. When presenting nodes to the user,
-we only show the map of identifier to definition, to avoid that
-exponential unfolding.
-
-We have two kinds of nodes.
-
-**** Value nodes
-
-These represent a target that, in any configuration, returns a fixed
-value. Source files would typically be represented this way. The
-constructor function ~"VALUE_NODE"~ takes a single argument ~"$1"~
-that has to be a result value.
-
-**** Abstract nodes
-
-These represent internal nodes in the dag. Their constructor
-~"ABSTRACT_NODE"~ takes the following arguments (all evaluated).
-- ~"node_type"~. An arbitrary string, not interpreted in any way, to
- indicate the role that the node has in the dependency structure.
- When we create an anonymous target from a node, this will serve
- as the key into the rule mapping to be applied.
-- ~"string_fields"~. This has to be a map of strings.
-- ~"target_fields"~. These have to be a map of lists of graph nodes.
-Moreover, we require that the keys for maps provided as ~"string_fields"~
-and ~"target_fields"~ be disjoint.
-
-*** Graph nodes in ~export~ targets
-
-Graph nodes are completely free of names and hence are eligible
-for exporting. As with other values, in the cache the intensional
-definition of artifacts implicit in them will be replaced by the
-corresponding, extensionally equal, known value.
-
-However, some care has to be taken in the serialisation that is
-part of the caching, as we do not want to unfold the dag to
-a tree. Therefore, we take as JSON serialisation a simple dict
-with ~"type"~ set to ~"NODE"~, and ~"value"~ set to the Merkle-tree
-hash. That serialisation respects intensional equality. To allow
-deserialisation, we add an additional map to the serialisation from
-node hash to its definition.
-
-*** Dependings on anonymous targets
-
-**** Parts of an anonymous target
-
-An anonymous target is given by a pair of a node and a map mapping
-the abstract node-type specifying strings to rule names. So, in
-the implementation these are just two expression pointers (with
-their defined notion of equality, i.e., equality of the respective
-Merkle-tree hashes). Such a pair of pointers also forms an additional
-variant of a name value, referring to such an anonymous target.
-
-It should be noted that such an anonymous target contains all the
-information needed to evaluate it in the same way as a regular (named)
-target defined by a user-defined rule. It is an analysis error
-analysing an anonymous target where there is no entry in the rules
-map for the string given as ~"node_type"~ for the corresponding node.
-
-**** Anonymous targets as additional dependencies
-
-We keep the property that a user can only request named targets.
-So anonymous targets have to be requested by other targets. We
-also keep the property that other targets are only requested at
-certain fixed steps in the evaluation of a target. To still achieve
-a meaningful use of anonymous targets our rule language handles
-anonymous targets in the following way.
-
-***** Rules parameter ~"anonymous"~
-
-In the rule definition a parameter ~"anonymous"~ (with empty map as
-default) is allowed. It is used to define an additional dependency on
-anonymous targets. The value has to be a map with keys the additional
-implicitly defined field names. It is hence a requirement that the
-set of keys be disjoint from all other field names (the values of
-~"config_fields"~, ~"string_fields"~, and ~"target_fields"~, as well as
-the keys of the ~"implict"~ parameter). Another consequence is that
-~"config_transitions"~ map may now also have meaningful entries for
-the keys of the ~"anonymous"~ map. Each value in the map has to be
-itself a map, with entries ~"target"~, ~"provider"~, and ~"rule_map"~.
-
-For ~"target"~, a single string has to be specifed, and the value has
-to be a member of the ~"target_fields"~ list. For provider, a single
-string has to be specified as well. The idea is that the nodes are
-collected from that provider of the targets in the specified target
-field. For ~"rule_map"~ a map has to be specified from strings to
-rule names; the latter are evaluated in the context of the rule
-definition.
-
-****** Example
-
-For generating language bindings for protocol buffers, a rule might
-look as follows.
-
-#+BEGIN_SRC
-{ "cc_proto_bindings":
- { "target_fields": ["proto_deps"]
- , "anonymous":
- { "protos":
- { "target": "proto_deps"
- , "provider": "proto"
- , "rule_map": {"proto_library": "cc_proto_library"}
- }
- }
- , "expression": {...}
- }
-}
-#+END_SRC
-
-***** Evaluation mechanism
-
-The evaluation of a target defined by a user-defined rule is handled
-as follows. After the target fields are evaluated as usual, an
-additional step is carried out.
-
-For each anonymous-target field, i.e., for each key in the ~"anonymous"~
-map, a list of anonymous targets is generated from the corresponding
-value: take all targets from the specified ~"target"~ field in all
-their specified configuration transitions (they have already been
-evaluated) and take the values provided for the specified ~"provider"~
-key (using the empty list as default). That value has to be a list
-of nodes. All the node lists obtained that way are concatenated.
-The configuration transition for the respective field name is
-evaluated. Those targets are then evaluated for all the transitioned
-configurations requested.
-
-In the final evaluation of the defining expression, the anonymous-target
-fields are available in the same way as any other target field.
-Also, they contribute to the effective configuration in the same
-way as regular target fields.