diff options
author | Oliver Reiche <oliver.reiche@huawei.com> | 2023-06-01 13:36:32 +0200 |
---|---|---|
committer | Oliver Reiche <oliver.reiche@huawei.com> | 2023-06-12 16:29:05 +0200 |
commit | b66a7359fbbff35af630c88c56598bbc06b393e1 (patch) | |
tree | d866802c4b44c13cbd90f9919cc7fc472091be0c | |
parent | 144b2c619f28c91663936cd445251ca28af45f88 (diff) | |
download | justbuild-b66a7359fbbff35af630c88c56598bbc06b393e1.tar.gz |
doc: Convert orgmode files to markdown
43 files changed, 4916 insertions, 4777 deletions
@@ -15,25 +15,25 @@ taken from user-defined rules described by functional expressions. [installation guide](INSTALL.md). * Tutorial - - [Getting Started](doc/tutorial/getting-started.org) - - [Hello World](doc/tutorial/hello-world.org) - - [Third party dependencies](doc/tutorial/third-party-software.org) - - [Tests](doc/tutorial/tests.org) - - [Targets versus `FILE`, `GLOB`, and `TREE`](doc/tutorial/target-file-glob-tree.org) - - [Ensuring reproducibility](doc/tutorial/rebuild.org) - - [Using protobuf](doc/tutorial/proto.org) + - [Getting Started](doc/tutorial/getting-started.md) + - [Hello World](doc/tutorial/hello-world.md) + - [Third party dependencies](doc/tutorial/third-party-software.md) + - [Tests](doc/tutorial/tests.md) + - [Targets versus `FILE`, `GLOB`, and `TREE`](doc/tutorial/target-file-glob-tree.md) + - [Ensuring reproducibility](doc/tutorial/rebuild.md) + - [Using protobuf](doc/tutorial/proto.md) - [How to create a single-node remote execution service](doc/tutorial/just-execute.org) ## Documentation -- [Overview](doc/concepts/overview.org) -- [Build Configurations](doc/concepts/configuration.org) -- [Multi-Repository Builds](doc/concepts/multi-repo.org) -- [Expression Language](doc/concepts/expressions.org) -- [Built-in Rules](doc/concepts/built-in-rules.org) -- [User-Defined Rules](doc/concepts/rules.org) -- [Documentation Strings](doc/concepts/doc-strings.org) -- [Cache Pragma and Testing](doc/concepts/cache-pragma.org) -- [Anonymous Targets](doc/concepts/anonymous-targets.org) -- [Target-Level Caching](doc/concepts/target-cache.org) -- [Garbage Collection](doc/concepts/garbage.org) +- [Overview](doc/concepts/overview.md) +- [Build Configurations](doc/concepts/configuration.md) +- [Multi-Repository Builds](doc/concepts/multi-repo.md) +- [Expression Language](doc/concepts/expressions.md) +- [Built-in Rules](doc/concepts/built-in-rules.md) +- [User-Defined Rules](doc/concepts/rules.md) +- [Documentation Strings](doc/concepts/doc-strings.md) +- [Cache Pragma and Testing](doc/concepts/cache-pragma.md) +- [Anonymous Targets](doc/concepts/anonymous-targets.md) +- [Target-Level Caching](doc/concepts/target-cache.md) +- [Garbage Collection](doc/concepts/garbage.md) diff --git a/doc/concepts/anonymous-targets.md b/doc/concepts/anonymous-targets.md new file mode 100644 index 00000000..6692d0ae --- /dev/null +++ b/doc/concepts/anonymous-targets.md @@ -0,0 +1,345 @@ +Anonymous targets +================= + +Motivation +---------- + +Using [Protocol buffers](https://github.com/protocolbuffers/protobuf) +allows to specify, in a language-independent way, a wire format for +structured data. This is done by using description files from which APIs +for various languages can be generated. As protocol buffers can contain +other protocol buffers, the description files themselves have a +dependency structure. + +From a software-engineering point of view, the challenge is to ensure +that the author of the description files does not have to be aware of +the languages for which APIs will be generated later. In fact, the main +benefit of the language-independent description is that clients in +various languages can be implemented using the same wire protocol (and +thus capable of communicating with the same server). + +For a build system that means that we have to expect that language +bindings at places far away from the protocol definition, and +potentially several times. Such a duplication can also occur implicitly +if two buffers, for which language bindings are generated both use a +common buffer for which bindings are never requested explicitly. Still, +we want to avoid duplicate work for common parts and we have to avoid +conflicts with duplicate symbols and staging conflicts for the libraries +for the common part. + +Our approach is that a "proto" target only provides the description +files together with their dependency structure. From those, a consuming +target generates "anonymous targets" as additional dependencies; as +those targets will have an appropriate notion of equality, no duplicate +work is done and hence, as a side effect, staging or symbol conflicts +are avoided as well. + +Preliminary remark: action identifiers +-------------------------------------- + +Actions are defined as Merkle-tree hash of the contents. As all +components (input tree, list of output strings, command vector, +environment, and cache pragma) are given by expressions, that can +quickly be computed. This identifier also defines the notion of equality +for actions, and hence action artifacts. Recall that equality of +artifacts is also (implicitly) used in our notion of disjoint map union +(where the set of keys does not have to be disjoint, as long as the +values for all duplicate keys are equal). + +When constructing the action graph for traversal, we can drop duplicates +(i.e., actions with the same identifier, and hence the same +description). For the serialization of the graph as part of the analyse +command, we can afford the preparatory step to compute a map from action +id to list of origins. + +Equality +-------- + +### Notions of equality + +In the context of builds, there are different concepts of equality to +consider. We recall the definitions, as well as their use in our build +tool. + +#### Locational equality ("Defined at the same place") + +Names (for targets and rules) are given by repository name, module +name, and target name (inside the module); additionally, for target +names, there's a bit specifying that we explicitly refer to a file. +Names are equal if and only if the respective strings (and the file +bit) are equal. + +For targets, we use locational equality, i.e., we consider targets +equal precisely if their names are equal; targets defined at +different places are considered different, even if they're defined +in the same way. The reason we use notion of equality is that we +have to refer to targets (and also check if we already have a +pending task to analyse them) before we have fully explored them +with all the targets referred to in their definition. + +#### Intensional equality ("Defined in the same way") + +In our expression language we handle definitions; in particular, we +treat artifacts by their definition: a particular source file, the +output of a particular action, etc. Hence we use intensional +equality in our expression language; two objects are equal precisely +if they are defined in the same way. This notion of equality is easy +to determine without the need of reading a source file or running an +action. We implement quick tests by keeping a Merkle-tree hash of +all expression values. + +#### Extensional equality ("Defining the same object") + +For built artifacts, we use extensional equality, i.e., we consider +two files equal, if they are bit-by-bit identical. +Implementation-wise, we compare an appropriate cryptographic hash. +Before running an action, we built its inputs. In particular (as +inputs are considered extensionally) an action might cause a cache +hit with an intensionally different one. + +#### Observable equality ("The defined objects behave in the same way") + +Finally, there is the notion of observable equality, i.e., the +property that two binaries behaving the same way in all situations. +As this notion is undecidable, it is never used directly by any +build tool. However, it is often the motivation for a build in the +first place: we want a binary that behaves in a particular way. + +### Relation between these notions + +The notions of equality were introduced in order from most fine grained +to most coarse. Targets defined at the same place are obviously defined +in the same way. Intensionally equal artifacts create equal action +graphs; here we can confidently say "equal" and not only isomorphic: +due to our preliminary clean up, even the node names are equal. Making +sure that equal actions produce bit-by-bit equal outputs is the realm of +[reproducibe builds](https://reproducible-builds.org/). The tool can +support this by appropriate sandboxing, etc, but the rules still have to +define actions that don't pick up non-input information like the +current time, user id, readdir order, etc. Files that are bit-by-bit +identical will behave in the same way. + +### Example + +Consider the following target file. + +```jsonc +{ "foo": + { "type": "generic" + , "outs": ["out.txt"] + , "cmds": ["echo Hello World > out.txt"] + } +, "bar": + { "type": "generic" + , "outs": ["out.txt"] + , "cmds": ["echo Hello World > out.txt"] + } +, "baz": + { "type": "generic" + , "outs": ["out.txt"] + , "cmds": ["echo -n Hello > out.txt && echo ' World' >> out.txt"] + } +, "foo upper": + { "type": "generic" + , "deps": ["foo"] + , "outs": ["upper.txt"] + , "cmds": ["cat out.txt | tr a-z A-Z > upper.txt"] + } +, "bar upper": + { "type": "generic" + , "deps": ["bar"] + , "outs": ["upper.txt"] + , "cmds": ["cat out.txt | tr a-z A-Z > upper.txt"] + } +, "baz upper": + { "type": "generic" + , "deps": ["baz"] + , "outs": ["upper.txt"] + , "cmds": ["cat out.txt | tr a-z A-Z > upper.txt"] + } +, "ALL": + { "type": "install" + , "files": + {"foo.txt": "foo upper", "bar.txt": "bar upper", "baz.txt": "baz upper"} + } +} +``` + +Assume we build the target `"ALL"`. Then we will analyse 7 targets, all +the locationally different ones (`"foo"`, `"bar"`, `"baz"`, +`"foo upper"`, `"bar upper"`, `"baz upper"`). For the targets `"foo"` +and `"bar"`, we immediately see that the definition is equal; their +intensional equality also renders `"foo upper"` and `"bar upper"` +intensionally equal. Our action graph will contain 4 actions: one with +origins `["foo", "bar"]`, one with origins `["baz"]`, one with origins +`["foo upper", "bar upper"]`, and one with origins `["baz +upper"]`. The `"install"` target will, of course, not create any +actions. Building sequentially (`-J 1`), we will get one cache hit. Even +though the artifacts of `"foo"` and `"bar"` and of `"baz"` are defined +differently, they are extensionally equal; both define a file with +contents `"Hello World\n"`. + +Anonymous targets +----------------- + +Besides named targets we also have additional targets (and hence also +configured targets) that are not associated with a location they are +defined at. Due to the absence of definition location, their notion of +equality will take care of the necessary deduplication (implicitly, by +the way our dependency exploration works). We will call them "anonymous +targets", even though, technically, they're not fully anonymous as the +rules that are part of their structure will be given by name, i.e., +defining rule location. + +### Value type: target graph node + +In order to allow targets to adequately describe a dependency structure, +we have a value type in our expression language, that of a (target) +graph node. As with all value types, equality is intensional, i.e., +nodes defined in the same way are equal even if defined at different +places. This can be achieved by our usual approach for expressions of +having cached Merkle-tree hashes and comparing them when an equality +test is required. This efficient test for equality also allows using +graph nodes as part of a map key, e.g., for our asynchronous map +consumers. + +As a graph node can only be defined with all data given, the defined +dependency structure is cycle-free by construction. However, the tree +unfolding will usually be exponentially larger. For internal handling, +this is not a problem: our shared-pointer implementation can efficiently +represent a directed acyclic graph and since we cache hashes in +expressions, we can compute the overall hash without folding the +structure to a tree. When presenting nodes to the user, we only show the +map of identifier to definition, to avoid that exponential unfolding. + +We have two kinds of nodes. + +#### Value nodes + +These represent a target that, in any configuration, returns a fixed +value. Source files would typically be represented this way. The +constructor function `"VALUE_NODE"` takes a single argument `"$1"` +that has to be a result value. + +#### Abstract nodes + +These represent internal nodes in the dag. Their constructor +`"ABSTRACT_NODE"` takes the following arguments (all evaluated). + + - `"node_type"`. An arbitrary string, not interpreted in any way, + to indicate the role that the node has in the dependency + structure. When we create an anonymous target from a node, this + will serve as the key into the rule mapping to be applied. + - `"string_fields"`. This has to be a map of strings. + - `"target_fields"`. These have to be a map of lists of graph + nodes. + +Moreover, we require that the keys for maps provided as +`"string_fields"` and `"target_fields"` be disjoint. + +### Graph nodes in `export` targets + +Graph nodes are completely free of names and hence are eligible for +exporting. As with other values, in the cache the intensional definition +of artifacts implicit in them will be replaced by the corresponding, +extensionally equal, known value. + +However, some care has to be taken in the serialisation that is part of +the caching, as we do not want to unfold the dag to a tree. Therefore, +we take as JSON serialisation a simple dict with `"type"` set to +`"NODE"`, and `"value"` set to the Merkle-tree hash. That serialisation +respects intensional equality. To allow deserialisation, we add an +additional map to the serialisation from node hash to its definition. + +### Dependings on anonymous targets + +#### Parts of an anonymous target + +An anonymous target is given by a pair of a node and a map mapping +the abstract node-type specifying strings to rule names. So, in the +implementation these are just two expression pointers (with their +defined notion of equality, i.e., equality of the respective +Merkle-tree hashes). Such a pair of pointers also forms an +additional variant of a name value, referring to such an anonymous +target. + +It should be noted that such an anonymous target contains all the +information needed to evaluate it in the same way as a regular +(named) target defined by a user-defined rule. It is an analysis +error analysing an anonymous target where there is no entry in the +rules map for the string given as `"node_type"` for the +corresponding node. + +#### Anonymous targets as additional dependencies + +We keep the property that a user can only request named targets. So +anonymous targets have to be requested by other targets. We also +keep the property that other targets are only requested at certain +fixed steps in the evaluation of a target. To still achieve a +meaningful use of anonymous targets our rule language handles +anonymous targets in the following way. + +##### Rules parameter `"anonymous"` + +In the rule definition a parameter `"anonymous"` (with empty map +as default) is allowed. It is used to define an additional +dependency on anonymous targets. The value has to be a map with +keys the additional implicitly defined field names. It is hence +a requirement that the set of keys be disjoint from all other +field names (the values of `"config_fields"`, `"string_fields"`, +and `"target_fields"`, as well as the keys of the `"implict"` +parameter). Another consequence is that `"config_transitions"` +map may now also have meaningful entries for the keys of the +`"anonymous"` map. Each value in the map has to be itself a map, +with entries `"target"`, `"provider"`, and `"rule_map"`. + +For `"target"`, a single string has to be specifed, and the +value has to be a member of the `"target_fields"` list. For +provider, a single string has to be specified as well. The idea +is that the nodes are collected from that provider of the +targets in the specified target field. For `"rule_map"` a map +has to be specified from strings to rule names; the latter are +evaluated in the context of the rule definition. + +###### Example + +For generating language bindings for protocol buffers, a +rule might look as follows. + +``` jsonc +{ "cc_proto_bindings": + { "target_fields": ["proto_deps"] + , "anonymous": + { "protos": + { "target": "proto_deps" + , "provider": "proto" + , "rule_map": {"proto_library": "cc_proto_library"} + } + } + , "expression": {...} + } +} +``` + +##### Evaluation mechanism + +The evaluation of a target defined by a user-defined rule is +handled as follows. After the target fields are evaluated as +usual, an additional step is carried out. + +For each anonymous-target field, i.e., for each key in the +`"anonymous"` map, a list of anonymous targets is generated from +the corresponding value: take all targets from the specified +`"target"` field in all their specified configuration +transitions (they have already been evaluated) and take the +values provided for the specified `"provider"` key (using the +empty list as default). That value has to be a list of nodes. +All the node lists obtained that way are concatenated. The +configuration transition for the respective field name is +evaluated. Those targets are then evaluated for all the +transitioned configurations requested. + +In the final evaluation of the defining expression, the +anonymous-target fields are available in the same way as any +other target field. Also, they contribute to the effective +configuration in the same way as regular target fields. diff --git a/doc/concepts/anonymous-targets.org b/doc/concepts/anonymous-targets.org deleted file mode 100644 index 98d194c7..00000000 --- a/doc/concepts/anonymous-targets.org +++ /dev/null @@ -1,336 +0,0 @@ -* Anonymous targets -** Motivation - -Using [[https://github.com/protocolbuffers/protobuf][Protocol -buffers]] allows to specify, in a language-independent way, a wire -format for structured data. This is done by using description files -from which APIs for various languages can be generated. As protocol -buffers can contain other protocol buffers, the description files -themselves have a dependency structure. - -From a software-engineering point of view, the challenge is to -ensure that the author of the description files does not have to -be aware of the languages for which APIs will be generated later. -In fact, the main benefit of the language-independent description -is that clients in various languages can be implemented using the -same wire protocol (and thus capable of communicating with the -same server). - -For a build system that means that we have to expect that language -bindings at places far away from the protocol definition, and -potentially several times. Such a duplication can also occur -implicitly if two buffers, for which language bindings are generated -both use a common buffer for which bindings are never requested -explicitly. Still, we want to avoid duplicate work for common parts -and we have to avoid conflicts with duplicate symbols and staging -conflicts for the libraries for the common part. - -Our approach is that a "proto" target only provides the description -files together with their dependency structure. From those, a -consuming target generates "anonymous targets" as additional -dependencies; as those targets will have an appropriate notion of -equality, no duplicate work is done and hence, as a side effect, -staging or symbol conflicts are avoided as well. - -** Preliminary remark: action identifiers - -Actions are defined as Merkle-tree hash of the contents. As all -components (input tree, list of output strings, command vector, -environment, and cache pragma) are given by expressions, that can -quickly be computed. This identifier also defines the notion of -equality for actions, and hence action artifacts. Recall that equality -of artifacts is also (implicitly) used in our notion of disjoint -map union (where the set of keys does not have to be disjoint, as -long as the values for all duplicate keys are equal). - -When constructing the action graph for traversal, we can drop -duplicates (i.e., actions with the same identifier, and hence the -same description). For the serialization of the graph as part of -the analyse command, we can afford the preparatory step to compute -a map from action id to list of origins. - -** Equality - -*** Notions of equality - -In the context of builds, there are different concepts of equality -to consider. We recall the definitions, as well as their use in -our build tool. - -**** Locational equality ("Defined at the same place") - -Names (for targets and rules) are given by repository name, module -name, and target name (inside the module); additionally, for target -names, there's a bit specifying that we explicitly refer to a file. -Names are equal if and only if the respective strings (and the file -bit) are equal. - -For targets, we use locational equality, i.e., we consider targets -equal precisely if their names are equal; targets defined at different -places are considered different, even if they're defined in the -same way. The reason we use notion of equality is that we have to -refer to targets (and also check if we already have a pending task -to analyse them) before we have fully explored them with all the -targets referred to in their definition. - -**** Intensional equality ("Defined in the same way") - -In our expression language we handle definitions; in particular, -we treat artifacts by their definition: a particular source file, -the output of a particular action, etc. Hence we use intensional -equality in our expression language; two objects are equal precisely -if they are defined in the same way. This notion of equality is easy -to determine without the need of reading a source file or running -an action. We implement quick tests by keeping a Merkle-tree hash -of all expression values. - -**** Extensional equality ("Defining the same object") - -For built artifacts, we use extensional equality, i.e., we consider -two files equal, if they are bit-by-bit identical. Implementation-wise, -we compare an appropriate cryptographic hash. Before running an -action, we built its inputs. In particular (as inputs are considered -extensionally) an action might cause a cache hit with an intensionally -different one. - -**** Observable equality ("The defined objects behave in the same way") - -Finally, there is the notion of observable equality, i.e., the -property that two binaries behaving the same way in all situations. -As this notion is undecidable, it is never used directly by any -build tool. However, it is often the motivation for a build in the -first place: we want a binary that behaves in a particular way. - -*** Relation between these notions - -The notions of equality were introduced in order from most fine grained -to most coarse. Targets defined at the same place are obviously defined -in the same way. Intensionally equal artifacts create equal action -graphs; here we can confidently say "equal" and not only isomorphic: -due to our preliminary clean up, even the node names are equal. -Making sure that equal actions produce bit-by-bit equal outputs -is the realm of [[https://reproducible-builds.org/][reproducibe -builds]]. The tool can support this by appropriate sandboxing, -etc, but the rules still have to define actions that don't pick -up non-input information like the current time, user id, readdir -order, etc. Files that are bit-by-bit identical will behave in -the same way. - -*** Example - -Consider the following target file. - -#+BEGIN_SRC -{ "foo": - { "type": "generic" - , "outs": ["out.txt"] - , "cmds": ["echo Hello World > out.txt"] - } -, "bar": - { "type": "generic" - , "outs": ["out.txt"] - , "cmds": ["echo Hello World > out.txt"] - } -, "baz": - { "type": "generic" - , "outs": ["out.txt"] - , "cmds": ["echo -n Hello > out.txt && echo ' World' >> out.txt"] - } -, "foo upper": - { "type": "generic" - , "deps": ["foo"] - , "outs": ["upper.txt"] - , "cmds": ["cat out.txt | tr a-z A-Z > upper.txt"] - } -, "bar upper": - { "type": "generic" - , "deps": ["bar"] - , "outs": ["upper.txt"] - , "cmds": ["cat out.txt | tr a-z A-Z > upper.txt"] - } -, "baz upper": - { "type": "generic" - , "deps": ["baz"] - , "outs": ["upper.txt"] - , "cmds": ["cat out.txt | tr a-z A-Z > upper.txt"] - } -, "ALL": - { "type": "install" - , "files": - {"foo.txt": "foo upper", "bar.txt": "bar upper", "baz.txt": "baz upper"} - } -} -#+END_SRC - -Assume we build the target ~"ALL"~. Then we will analyse 7 targets, -all the locationally different ones (~"foo"~, ~"bar"~, ~"baz"~, -~"foo upper"~, ~"bar upper"~, ~"baz upper"~). For the targets ~"foo"~ -and ~"bar"~, we immediately see that the definition is equal; their -intensional equality also renders ~"foo upper"~ and ~"bar upper"~ -intensionally equal. Our action graph will contain 4 actions: one -with origins ~["foo", "bar"]~, one with origins ~["baz"]~, one with -origins ~["foo upper", "bar upper"]~, and one with origins ~["baz -upper"]~. The ~"install"~ target will, of course, not create any -actions. Building sequentially (~-J 1~), we will get one cache hit. -Even though the artifacts of ~"foo"~ and ~"bar"~ and of ~"baz~" -are defined differently, they are extensionally equal; both define -a file with contents ~"Hello World\n"~. - -** Anonymous targets - -Besides named targets we also have additional targets (and hence also -configured targets) that are not associated with a location they are -defined at. Due to the absence of definition location, their notion -of equality will take care of the necessary deduplication (implicitly, -by the way our dependency exploration works). We will call them -"anonymous targets", even though, technically, they're not fully -anonymous as the rules that are part of their structure will be -given by name, i.e., defining rule location. - -*** Value type: target graph node - -In order to allow targets to adequately describe a dependency -structure, we have a value type in our expression language, that -of a (target) graph node. As with all value types, equality is -intensional, i.e., nodes defined in the same way are equal even -if defined at different places. This can be achieved by our usual -approach for expressions of having cached Merkle-tree hashes and -comparing them when an equality test is required. This efficient -test for equality also allows using graph nodes as part of a map -key, e.g., for our asynchronous map consumers. - -As a graph node can only be defined with all data given, the defined -dependency structure is cycle-free by construction. However, the -tree unfolding will usually be exponentially larger. For internal -handling, this is not a problem: our shared-pointer implementation -can efficiently represent a directed acyclic graph and since we -cache hashes in expressions, we can compute the overall hash without -folding the structure to a tree. When presenting nodes to the user, -we only show the map of identifier to definition, to avoid that -exponential unfolding. - -We have two kinds of nodes. - -**** Value nodes - -These represent a target that, in any configuration, returns a fixed -value. Source files would typically be represented this way. The -constructor function ~"VALUE_NODE"~ takes a single argument ~"$1"~ -that has to be a result value. - -**** Abstract nodes - -These represent internal nodes in the dag. Their constructor -~"ABSTRACT_NODE"~ takes the following arguments (all evaluated). -- ~"node_type"~. An arbitrary string, not interpreted in any way, to - indicate the role that the node has in the dependency structure. - When we create an anonymous target from a node, this will serve - as the key into the rule mapping to be applied. -- ~"string_fields"~. This has to be a map of strings. -- ~"target_fields"~. These have to be a map of lists of graph nodes. -Moreover, we require that the keys for maps provided as ~"string_fields"~ -and ~"target_fields"~ be disjoint. - -*** Graph nodes in ~export~ targets - -Graph nodes are completely free of names and hence are eligible -for exporting. As with other values, in the cache the intensional -definition of artifacts implicit in them will be replaced by the -corresponding, extensionally equal, known value. - -However, some care has to be taken in the serialisation that is -part of the caching, as we do not want to unfold the dag to -a tree. Therefore, we take as JSON serialisation a simple dict -with ~"type"~ set to ~"NODE"~, and ~"value"~ set to the Merkle-tree -hash. That serialisation respects intensional equality. To allow -deserialisation, we add an additional map to the serialisation from -node hash to its definition. - -*** Dependings on anonymous targets - -**** Parts of an anonymous target - -An anonymous target is given by a pair of a node and a map mapping -the abstract node-type specifying strings to rule names. So, in -the implementation these are just two expression pointers (with -their defined notion of equality, i.e., equality of the respective -Merkle-tree hashes). Such a pair of pointers also forms an additional -variant of a name value, referring to such an anonymous target. - -It should be noted that such an anonymous target contains all the -information needed to evaluate it in the same way as a regular (named) -target defined by a user-defined rule. It is an analysis error -analysing an anonymous target where there is no entry in the rules -map for the string given as ~"node_type"~ for the corresponding node. - -**** Anonymous targets as additional dependencies - -We keep the property that a user can only request named targets. -So anonymous targets have to be requested by other targets. We -also keep the property that other targets are only requested at -certain fixed steps in the evaluation of a target. To still achieve -a meaningful use of anonymous targets our rule language handles -anonymous targets in the following way. - -***** Rules parameter ~"anonymous"~ - -In the rule definition a parameter ~"anonymous"~ (with empty map as -default) is allowed. It is used to define an additional dependency on -anonymous targets. The value has to be a map with keys the additional -implicitly defined field names. It is hence a requirement that the -set of keys be disjoint from all other field names (the values of -~"config_fields"~, ~"string_fields"~, and ~"target_fields"~, as well as -the keys of the ~"implict"~ parameter). Another consequence is that -~"config_transitions"~ map may now also have meaningful entries for -the keys of the ~"anonymous"~ map. Each value in the map has to be -itself a map, with entries ~"target"~, ~"provider"~, and ~"rule_map"~. - -For ~"target"~, a single string has to be specifed, and the value has -to be a member of the ~"target_fields"~ list. For provider, a single -string has to be specified as well. The idea is that the nodes are -collected from that provider of the targets in the specified target -field. For ~"rule_map"~ a map has to be specified from strings to -rule names; the latter are evaluated in the context of the rule -definition. - -****** Example - -For generating language bindings for protocol buffers, a rule might -look as follows. - -#+BEGIN_SRC -{ "cc_proto_bindings": - { "target_fields": ["proto_deps"] - , "anonymous": - { "protos": - { "target": "proto_deps" - , "provider": "proto" - , "rule_map": {"proto_library": "cc_proto_library"} - } - } - , "expression": {...} - } -} -#+END_SRC - -***** Evaluation mechanism - -The evaluation of a target defined by a user-defined rule is handled -as follows. After the target fields are evaluated as usual, an -additional step is carried out. - -For each anonymous-target field, i.e., for each key in the ~"anonymous"~ -map, a list of anonymous targets is generated from the corresponding -value: take all targets from the specified ~"target"~ field in all -their specified configuration transitions (they have already been -evaluated) and take the values provided for the specified ~"provider"~ -key (using the empty list as default). That value has to be a list -of nodes. All the node lists obtained that way are concatenated. -The configuration transition for the respective field name is -evaluated. Those targets are then evaluated for all the transitioned -configurations requested. - -In the final evaluation of the defining expression, the anonymous-target -fields are available in the same way as any other target field. -Also, they contribute to the effective configuration in the same -way as regular target fields. diff --git a/doc/concepts/built-in-rules.md b/doc/concepts/built-in-rules.md new file mode 100644 index 00000000..3672df36 --- /dev/null +++ b/doc/concepts/built-in-rules.md @@ -0,0 +1,172 @@ +Built-in rules +============== + +Targets are defined in `TARGETS` files. Each target file is a single +`JSON` object. If the target name is contained as a key in that object, +the corresponding value defines the target; otherwise it is implicitly +considered a source file. The target definition itself is a `JSON` +object as well. The mandatory key `"type"` specifies the rule defining +the target; the meaning of the remaining keys depends on the rule +defining the target. + +There are a couple of rules built in, all named by a single string. The +user can define additional rules (and, in fact, we expect the majority +of targets to be defined by user-defined rules); referring to them in a +qualified way (with module) will always refer to those even if new +built-in rules are added later (as built-in rules will always be only +named by a single string). + +The following rules are built in. Built-in rules can have a special +syntax. + +`"export"` +---------- + +The `"export"` rule evaluates a given target in a specified +configuration. More precisely, the field `"target"` has to name a single +target (not a list of targets), the field `"flexible_config"` a list of +strings, treated as variable names, and the field `"fixed_config"` has +to be a map that is taken unevaluated. It is a requirement that the +domain of the `"fixed_config"` and the `"flexible_config"` be disjoint. +The optional fields `"doc"` and `"config_doc"` can be used to describe +the target and the `"flexible_config"`, respectively. + +To evaluate an `"export"` target, first the configuration is restricted +to the `"flexible_config"` and then the union with the `"fixed_config"` +is built. The target specified in `"target"` is then evaluated. It is a +requirement that this target be untainted. The result is the result of +this evaluation; artifacts, runfiles, and provides map are forwarded +unchanged. + +The main point of the `"export"` rule is, that the relevant part of the +configuration can be determined without having to analyze the target +itself. This makes such rules eligible for target-level caching +(provided the content of the repository as well as all reachable ones +can be determined cheaply). This eligibility is also the reason why it +is good practice to only depend on `"export"` targets of other +repositories. + +`"install"` +----------- + +The `"install"` rules allows to stage artifacts (and runfiles) of other +targets in a different way. More precisely, a new stage (i.e., map of +artifacts with keys treated as file names) is constructed in the +following way. + +The runfiles from all targets in the `"deps"` field are taken; the +`"deps"` field is an evaluated field and has to evaluate to a list of +targets. It is an error, if those runfiles conflict. + +The `"files"` argument is a special form. It has to be a map, and the +keys are taken as paths. The values are evaluated and have to evaluate +to a single target. That target has to have a single artifact or no +artifacts and a single run file. In this way, `"files"` defines a stage; +this stage overlays the runfiles of the `"deps"` and conflicts are +ignored. + +Finally, the `"dirs"` argument has to evaluate to a list of pairs (i.e., +lists of length two) with the first argument a target name and the +second argument a string, taken as directory name. For each entry, both, +runfiles and artifacts of the specified target are staged to the +specified directory. It is an error if a conflict with the stage +constructed so far occurs. + +Both, runfiles and artifacts of the `"install"` target are the stage +just described. An `"install"` target always has an empty provides map. +Any provided information of the dependencies is discarded. + +`"generic"` +----------- + +The `"generic"` rules allows to define artifacts as the output of an +action. This is mainly useful for ad-hoc constructions; for anything +occurring more often, a proper user-defined rule is usually the better +choice. + +The `"deps"` argument is evaluated and has to evaluate to a list of +target names. The runfiles and artifacts of these targets form the +inputs of the action. Conflicts are not an error and resolved by giving +precedence to the artifacts over the runfiles; conflicts within +artifacts or runfiles are resolved in a latest-wins fashion using the +order of the targets in the evaluated `"deps"` argument. + +The fields `"cmds"`, `"out_dirs"`, `"outs"`, and `"env"` are evaluated +fields where `"cmds"`, `"out_dirs"`, and `"outs"` have to evaluate to a +list of strings, and `"env"` has to evaluate to a map of strings. During +their evaluation, the functions `"out_dirs"`, `"outs"` and `"runfiles"` +can be used to access the logical paths of the directories, artifacts +and runfiles, respectively, of a target specified in `"deps"`. Here, +`"env"` specifies the environment in which the action is carried out. +`"out_dirs"` and `"outs"` define the output directories and files, +respectively, the action has to produce. Since some artifacts are to be +produced, at least one of `"out_dirs"` or `"outs"` must be a non-empty +list of strings. It is an error if one or more paths are present in both +the `"out_dirs"` and `"outs"`. Finally, the strings in `"cmds"` are +extended by a newline character and joined, and command of the action is +interpreting this string by `sh`. + +The artifacts of this target are the outputs (as declared by +`"out_dirs"` and `"outs"`) of this action. Runfiles and provider map are +empty. + +`"file_gen"` +------------ + +The `"file_gen"` rule allows to specify a file with a given content. To +be able to accurately report about file names of artifacts or runfiles +of other targets, they can be specified in the field `"deps"` which has +to evaluate to a list of targets. The names of the artifacts and +runfiles of a target specified in `"deps"` can be accessed through the +functions `"outs"` and `"runfiles"`, respectively, during the evaluation +of the arguments `"name"` and `"data"` which have to evaluate to a +single string. + +Artifacts and runfiles of a `"file_gen"` target are a singleton map with +key the result of evaluating `"name"` and value a (non-executable) file +with content the result of evaluating `"data"`. The provides map is +empty. + +`"tree"` +-------- + +The `"tree"` rule allows to specify a tree out of the artifact stage of +given targets. More precisely, the deps field `"deps"` has to evaluate +to a list of targets. For each target, runfiles and artifacts are +overlayed in an artifacts-win fashion and the union of the resulting +stages is taken; it is an error if conflicts arise in this way. The +resulting stage is transformed into a tree. Both, artifacts and runfiles +of the `"tree"` target are a singleton map with the key the result of +evaluating `"name"` (which has to evaluate to a single string) and value +that tree. + +`"configure"` +------------- + +The `"configure"` rule allows to configure a target with a given +configuration. The field `"target"` is evaluated and the result of the +evaluation must name a single target (not a list). The `"config"` field +is evaluated and must result in a map, which is used as configuration +for the given target. + +This rule uses the given configuration to overlay the current +environment for evaluating the given target, and thereby performs a +configuration transition. It forwards all results +(artifacts/runfiles/provides map) of the configured target to the upper +context. The result of a target that uses this rule is the result of the +target given in the `"target"` field (the configured target). + +As a full configuration transition is performed, the same care has to be +taken when using this rule as when writing a configuration transition in +a rule. Typically, this rule is used only at a top-level target of a +project and configures only variables internally to the project. In any +case, when using non-internal targets as dependencies (i.e., targets +that a caller of the `"configure"` potentially might use as well), care +should be taken that those are only used in the initial configuration. +Such preservation of the configuration is necessary to avoid conflicts, +if the targets depended upon are visible in the `"configure"` target +itself, e.g., as link dependency (which almost always happens when +depending on a library). Even if a non-internal target depended upon is +not visible in the `"configure"` target itself, requesting it in a +modified configuration causes additional overhead by increasing the +target graph and potentially the action graph. diff --git a/doc/concepts/built-in-rules.org b/doc/concepts/built-in-rules.org deleted file mode 100644 index 9463b10c..00000000 --- a/doc/concepts/built-in-rules.org +++ /dev/null @@ -1,167 +0,0 @@ -* Built-in rules - -Targets are defined in ~TARGETS~ files. Each target file is a single -~JSON~ object. If the target name is contained as a key in that -object, the corresponding value defines the target; otherwise it is -implicitly considered a source file. The target definition itself -is a ~JSON~ object as well. The mandatory key ~"type"~ specifies -the rule defining the target; the meaning of the remaining keys -depends on the rule defining the target. - -There are a couple of rules built in, all named by a single string. -The user can define additional rules (and, in fact, we expect the -majority of targets to be defined by user-defined rules); referring -to them in a qualified way (with module) will always refer to those -even if new built-in rules are added later (as built-in rules will -always be only named by a single string). - -The following rules are built in. Built-in rules can have a -special syntax. - -** ~"export"~ - -The ~"export"~ rule evaluates a given target in a specified -configuration. More precisely, the field ~"target"~ has to name a single -target (not a list of targets), the field ~"flexible_config"~ a list -of strings, treated as variable names, and the field ~"fixed_config"~ -has to be a map that is taken unevaluated. It is a requirement that -the domain of the ~"fixed_config"~ and the ~"flexible_config"~ be -disjoint. The optional fields ~"doc"~ and ~"config_doc"~ can be used -to describe the target and the ~"flexible_config"~, respectively. - -To evaluate an ~"export"~ target, first the configuration is -restricted to the ~"flexible_config"~ and then the union with the -~"fixed_config"~ is built. The target specified in ~"target"~ is -then evaluated. It is a requirement that this target be untainted. -The result is the result of this evaluation; artifacts, runfiles, -and provides map are forwarded unchanged. - -The main point of the ~"export"~ rule is, that the relevant part -of the configuration can be determined without having to analyze -the target itself. This makes such rules eligible for target-level -caching (provided the content of the repository as well as all -reachable ones can be determined cheaply). This eligibility is also -the reason why it is good practice to only depend on ~"export"~ -targets of other repositories. - -** ~"install"~ - -The ~"install"~ rules allows to stage artifacts (and runfiles) of -other targets in a different way. More precisely, a new stage (i.e., -map of artifacts with keys treated as file names) is constructed -in the following way. - -The runfiles from all targets in the ~"deps"~ field are taken; the -~"deps"~ field is an evaluated field and has to evaluate to a list -of targets. It is an error, if those runfiles conflict. - -The ~"files"~ argument is a special form. It has to be a map, and -the keys are taken as paths. The values are evaluated and have -to evaluate to a single target. That target has to have a single -artifact or no artifacts and a single run file. In this way, ~"files"~ -defines a stage; this stage overlays the runfiles of the ~"deps"~ -and conflicts are ignored. - -Finally, the ~"dirs"~ argument has to evaluate to a list of -pairs (i.e., lists of length two) with the first argument a target -name and the second argument a string, taken as directory name. For -each entry, both, runfiles and artifacts of the specified target -are staged to the specified directory. It is an error if a conflict -with the stage constructed so far occurs. - -Both, runfiles and artifacts of the ~"install"~ target are the stage -just described. An ~"install"~ target always has an empty provides -map. Any provided information of the dependencies is discarded. - -** ~"generic"~ - -The ~"generic"~ rules allows to define artifacts as the output -of an action. This is mainly useful for ad-hoc constructions; for -anything occurring more often, a proper user-defined rule is usually -the better choice. - -The ~"deps"~ argument is evaluated and has to evaluate to a list -of target names. The runfiles and artifacts of these targets form -the inputs of the action. Conflicts are not an error and resolved -by giving precedence to the artifacts over the runfiles; conflicts -within artifacts or runfiles are resolved in a latest-wins fashion -using the order of the targets in the evaluated ~"deps"~ argument. - -The fields ~"cmds"~, ~"out_dirs"~, ~"outs"~, and ~"env"~ are evaluated -fields where ~"cmds"~, ~"out_dirs"~, and ~"outs"~ have to evaluate to -a list of strings, and ~"env"~ has to evaluate to a map of -strings. During their evaluation, the functions ~"out_dirs"~, ~"outs"~ -and ~"runfiles"~ can be used to access the logical paths of the -directories, artifacts and runfiles, respectively, of a target -specified in ~"deps"~. Here, ~"env"~ specifies the environment in -which the action is carried out. ~"out_dirs"~ and ~"outs"~ define the -output directories and files, respectively, the action has to -produce. Since some artifacts are to be produced, at least one of -~"out_dirs"~ or ~"outs"~ must be a non-empty list of strings. It is an -error if one or more paths are present in both the ~"out_dirs"~ and -~"outs"~. Finally, the strings in ~"cmds"~ are extended by a newline -character and joined, and command of the action is interpreting this -string by ~sh~. - -The artifacts of this target are the outputs (as declared by -~"out_dirs"~ and ~"outs"~) of this action. Runfiles and provider map -are empty. - -** ~"file_gen"~ - -The ~"file_gen"~ rule allows to specify a file with a given content. -To be able to accurately report about file names of artifacts -or runfiles of other targets, they can be specified in the field -~"deps"~ which has to evaluate to a list of targets. The names -of the artifacts and runfiles of a target specified in ~"deps"~ -can be accessed through the functions ~"outs"~ and ~"runfiles"~, -respectively, during the evaluation of the arguments ~"name"~ and -~"data"~ which have to evaluate to a single string. - -Artifacts and runfiles of a ~"file_gen"~ target are a singleton map -with key the result of evaluating ~"name"~ and value a (non-executable) -file with content the result of evaluating ~"data"~. The provides -map is empty. - -** ~"tree"~ - -The ~"tree"~ rule allows to specify a tree out of the artifact -stage of given targets. More precisely, the deps field ~"deps"~ -has to evaluate to a list of targets. For each target, runfiles -and artifacts are overlayed in an artifacts-win fashion and -the union of the resulting stages is taken; it is an error if conflicts -arise in this way. The resulting stage is transformed into a tree. -Both, artifacts and runfiles of the ~"tree"~ target are a singleton map -with the key the result of evaluating ~"name"~ (which has to evaluate to -a single string) and value that tree. - - -** ~"configure"~ - -The ~"configure"~ rule allows to configure a target with a given -configuration. The field ~"target"~ is evaluated and the result -of the evaluation must name a single target (not a list). The -~"config"~ field is evaluated and must result in a map, which is -used as configuration for the given target. - -This rule uses the given configuration to overlay the current environment for -evaluating the given target, and thereby performs a configuration transition. It -forwards all results (artifacts/runfiles/provides map) of the configured target -to the upper context. The result of a target that uses this rule is the result -of the target given in the ~"target"~ field (the configured target). - -As a full configuration transition is performed, the same care has -to be taken when using this rule as when writing a configuration -transition in a rule. Typically, this rule is used only at a -top-level target of a project and configures only variables internally -to the project. In any case, when using non-internal targets as -dependencies (i.e., targets that a caller of the ~"configure"~ -potentially might use as well), care should be taken that those -are only used in the initial configuration. Such preservation of -the configuration is necessary to avoid conflicts, if the targets -depended upon are visible in the ~"configure"~ target itself, e.g., -as link dependency (which almost always happens when depending on a -library). Even if a non-internal target depended upon is not visible -in the ~"configure"~ target itself, requesting it in a modified -configuration causes additional overhead by increasing the target -graph and potentially the action graph. diff --git a/doc/concepts/cache-pragma.md b/doc/concepts/cache-pragma.md new file mode 100644 index 00000000..858f2b4f --- /dev/null +++ b/doc/concepts/cache-pragma.md @@ -0,0 +1,134 @@ +Action caching pragma +===================== + +Introduction: exit code, build failures, and caching +---------------------------------------------------- + +The exit code of a process is used to signal success or failure of that +process. By convention, 0 indicates success and any other value +indicates some form of failure. + +Our tool expects all build actions to follow this convention. A non-zero +exit code of a regular build action has two consequences. + + - As the action failed, the whole build is aborted and considered + failed. + - As such a failed action can never be part of a successful build, it + is (effectively) not cached. + +This non-caching is achieved by rerequesting an action without cache +look up, if a failed action from cache is reported. + +In particular, for building, we have the property that everything that +does not lead to aborting the build can (and will) be cached. This +property is justified as we expect build actions to behave in a +functional way. + +Test and run actions +-------------------- + +Tests have a lot of similarity to regular build actions: a process is +run with given inputs, and the results are processed further (e.g., to +create reports on test suites). However, they break the above described +connection between caching and continuation of the build: we expect that +some tests might be flaky (even though they shouldn't be, of course) +and hence only want to cache successful tests. Nevertheless, we do want +to continue testing after the first test failure. + +Another breakage of the functionality assumption of actions are "run" +actions, i.e., local actions that are executed either because of their +side effect on the host system, or because of their non-deterministic +results (e.g., monitoring some resource). Those actions should never be +cached, but if they fail, the build should be aborted. + +Tainting +-------- + +Targets that, directly or indirectly, depend on non-functional actions +are not regular targets. They are test targets, run targets, benchmark +results, etc; in any case, they are tainted in some way. When adding +high-level caching of targets, we will only support caching for +untainted targets. + +To make everybody aware of their special nature, they are clearly marked +as such: tainted targets not generated by a tainted rule (e.g., a test +rule) have to explicitly state their taintedness in their attributes. +This declaration also gives a natural way to mark targets that are +technically pure, but still should be used only in test, e.g., a mock +version of a larger library. + +Besides being for tests only, there might be other reasons why a target +might not be fit for general use, e.g., configuration files with +accounts for developer access, or files under restrictive licences. To +avoid having to extend the framework for each new use case, we allow +arbitrary strings as markers for the kind of taintedness of a target. Of +course, a target can be tainted in more than one way. + +More precisely, rules can have `"tainted"` as an additional property. +Moreover `"tainted"` is another reserved keyword for target arguments +(like `"type"` and `"arguments_config"`). In both cases, the value has +to be a list of strings, and the empty list is assumed, if not +specified. + +A rule is tainted with the set of strings in its `"tainted"` property. A +target is tainted with the union of the set of strings of its +`"tainted"` argument and the set of strings its generating rule is +tainted with. + +Every target has to be tainted with (at least) the union of what its +dependencies are tainted with. + +For tainted targets, the `analyse`, `build`, and `install` commands +report the set of strings the target is tainted with. + +### `"may_fail"` and `"no_cache"` properties of `"ACTION"` + +The `"ACTION"` function in the defining expression of a rule have two +additional (besides inputs, etc) parameters `"may_fail"` and +`"no_cache"`. Those are not evaluated and have to be lists of strings +(with empty assumed if the respective parameter is not present). Only +strings the defining rule is tainted with may occur in that list. If the +list is not empty, the corresponding may-fail or no-cache bit of the +action is set. + +For actions with the `"may_fail"` bit set, the optional parameter +`"fail_message"` with default value `"action failed"` is evaluated. That +message will be reported if the action returns a non-zero exit value. + +Actions with the no-cache bit set are never cached. If an action with +the may-fail bit set exits with non-zero exit value, the build is +continued if the action nevertheless managed to produce all expected +outputs. We continue to ignore actions with non-zero exit status from +cache. + +### Marking of failed artifacts + +To simplify finding failures in accumulated reports, our tool keeps +track of artifacts generated by failed actions. More precisely, +artifacts are considered failed if one of the following conditions +applies. + + - Artifacts generated by failed actions are failed. + - Tree artifacts containing a failed artifact are failed. + - Artifacts generated by an action taking a failed artifact as input + are failed. + +The identifiers used for built artifacts (including trees) remain +unchanged; in particular, they will only describe the contents and not +if they were obtained in a failed way. + +When reporting artifacts, e.g., in the log file, an additional marker is +added to indicate that the artifact is a failed one. After every `build` +or `install` command, if the requested artifacts contain failed one, a +different exit code is returned. + +### The `install-cas` subcommand + +A typical workflow for testing is to first run the full test suite and +then only look at the failed tests in more details. As we don't take +failed actions from cache, installing the output can't be done by +rerunning the same target as `install` instead of `build`. Instead, the +output has to be taken from CAS using the identifier shown in the build +log. To simplify this workflow, there is the `install-cas` subcommand +that installs a CAS entry, identified by the identifier as shown in the +log to a given location or (if no location is specified) to `stdout`. diff --git a/doc/concepts/cache-pragma.org b/doc/concepts/cache-pragma.org deleted file mode 100644 index 11953702..00000000 --- a/doc/concepts/cache-pragma.org +++ /dev/null @@ -1,130 +0,0 @@ -* Action caching pragma - -** Introduction: exit code, build failures, and caching - -The exit code of a process is used to signal success or failure -of that process. By convention, 0 indicates success and any other -value indicates some form of failure. - -Our tool expects all build actions to follow this convention. A -non-zero exit code of a regular build action has two consequences. -- As the action failed, the whole build is aborted and considered failed. -- As such a failed action can never be part of a successful build, - it is (effectively) not cached. -This non-caching is achieved by rerequesting an action without -cache look up, if a failed action from cache is reported. - -In particular, for building, we have the property that everything -that does not lead to aborting the build can (and will) be cached. -This property is justified as we expect build actions to behave in -a functional way. - -** Test and run actions - -Tests have a lot of similarity to regular build actions: a process is -run with given inputs, and the results are processed further (e.g., -to create reports on test suites). However, they break the above -described connection between caching and continuation of the -build: we expect that some tests might be flaky (even though they -shouldn't be, of course) and hence only want to cache successful -tests. Nevertheless, we do want to continue testing after the first -test failure. - -Another breakage of the functionality assumption of actions are -"run" actions, i.e., local actions that are executed either because -of their side effect on the host system, or because of their -non-deterministic results (e.g., monitoring some resource). Those -actions should never be cached, but if they fail, the build should -be aborted. - -** Tainting - -Targets that, directly or indirectly, depend on non-functional -actions are not regular targets. They are test targets, run targets, -benchmark results, etc; in any case, they are tainted in some way. -When adding high-level caching of targets, we will only support -caching for untainted targets. - -To make everybody aware of their special nature, they are clearly -marked as such: tainted targets not generated by a tainted rule (e.g., -a test rule) have to explicitly state their taintedness in their -attributes. This declaration also gives a natural way to mark targets -that are technically pure, but still should be used only in test, -e.g., a mock version of a larger library. - -Besides being for tests only, there might be other reasons why a -target might not be fit for general use, e.g., configuration files -with accounts for developer access, or files under restrictive -licences. To avoid having to extend the framework for each new -use case, we allow arbitrary strings as markers for the kind of -taintedness of a target. Of course, a target can be tainted in more -than one way. - -More precisely, rules can have ~"tainted"~ as an additional -property. Moreover ~"tainted"~ is another reserved keyword for -target arguments (like ~"type"~ and ~"arguments_config"~). In both -cases, the value has to be a list of strings, and the empty list -is assumed, if not specified. - -A rule is tainted with the set of strings in its ~"tainted"~ -property. A target is tainted with the union of the set of strings -of its ~"tainted"~ argument and the set of strings its generating -rule is tainted with. - -Every target has to be tainted with (at least) the union of what -its dependencies are tainted with. - -For tainted targets, the ~analyse~, ~build~, and ~install~ commands -report the set of strings the target is tainted with. - -*** ~"may_fail"~ and ~"no_cache"~ properties of ~"ACTION"~ - -The ~"ACTION"~ function in the defining expression of a rule -have two additional (besides inputs, etc) parameters ~"may_fail"~ -and ~"no_cache"~. Those are not evaluated and have to be lists -of strings (with empty assumed if the respective parameter is not -present). Only strings the defining rule is tainted with may occur -in that list. If the list is not empty, the corresponding may-fail -or no-cache bit of the action is set. - -For actions with the ~"may_fail"~ bit set, the optional parameter -~"fail_message"~ with default value ~"action failed"~ is evaluated. -That message will be reported if the action returns a non-zero -exit value. - -Actions with the no-cache bit set are never cached. If an action -with the may-fail bit set exits with non-zero exit value, the build -is continued if the action nevertheless managed to produce all -expected outputs. We continue to ignore actions with non-zero exit -status from cache. - -*** Marking of failed artifacts - -To simplify finding failures in accumulated reports, our tool -keeps track of artifacts generated by failed actions. More -precisely, artifacts are considered failed if one of the following -conditions applies. -- Artifacts generated by failed actions are failed. -- Tree artifacts containing a failed artifact are failed. -- Artifacts generated by an action taking a failed artifact as - input are failed. -The identifiers used for built artifacts (including trees) remain -unchanged; in particular, they will only describe the contents and -not if they were obtained in a failed way. - -When reporting artifacts, e.g., in the log file, an additional marker -is added to indicate that the artifact is a failed one. After every -~build~ or ~install~ command, if the requested artifacts contain -failed one, a different exit code is returned. - -*** The ~install-cas~ subcommand - -A typical workflow for testing is to first run the full test suite -and then only look at the failed tests in more details. As we don't -take failed actions from cache, installing the output can't be -done by rerunning the same target as ~install~ instead of ~build~. -Instead, the output has to be taken from CAS using the identifier -shown in the build log. To simplify this workflow, there is the -~install-cas~ subcommand that installs a CAS entry, identified by -the identifier as shown in the log to a given location or (if no -location is specified) to ~stdout~. diff --git a/doc/concepts/configuration.md b/doc/concepts/configuration.md new file mode 100644 index 00000000..743ed41e --- /dev/null +++ b/doc/concepts/configuration.md @@ -0,0 +1,115 @@ +Configuration +============= + +Targets describe abstract concepts like "library". Depending on +requirements, a library might manifest itself in different ways. For +example, + + - it can be built for various target architectures, + - it can have the requirement to produce position-independent code, + - it can be a special build for debugging, profiling, etc. + +So, a target (like a library described by header files, source files, +dependencies, etc) has some additional input. As those inputs are +typically of a global nature (e.g., a profiling build usually wants all +involved libraries to be built for profiling), this additional input, +called "configuration" follows the same approach as the `UNIX` +environment: it is a global collection of key-value pairs and every +target picks, what it needs. + +Top-level configuration +----------------------- + +The configuration is a `JSON` object. The configuration for the target +requested can be specified on the command line using the `-c` option; +its argument is a file name and that file is supposed to contain the +`JSON` object. + +Propagation +----------- + +Rules and target definitions have to declare which parts of the +configuration they want to have access to. The (essentially) full +configuration, however, is passed on to the dependencies; in this way, a +target not using a part of the configuration can still depend on it, if +one of its dependencies does. + +### Rules configuration and configuration transitions + +As part of the definition of a rule, it specifies a set `"config_vars"` +of variables. During the evaluation of the rule, the configuration +restricted to those variables (variables unset in the original +configuration are set to `null`) is used as environment. + +Additionally, the rule can request that certain targets be evaluated in +a modified configuration by specifying `"config_transitions"` +accordingly. Typically, this is done when a tool is required during the +build; then this tool has to be built for the architecture on which the +build is carried out and not the target architecture. Those tools often +are `"implicit"` dependencies, i.e., dependencies that every target +defined by that rule has, without the need to specify it in the target +definition. + +### Target configuration + +Additionally (and independently of the configuration-dependency of the +rule), the target definition itself can depend on the configuration. +This can happen, if a debug version of a library has additional +dependencies (e.g., for structured debug logs). + +If such a configuration-dependency is needed, the reserved key word +`"arguments_config"` is used to specify a set of variables (if unset, +the empty set is assumed; this should be the usual case). The +environment in which all arguments of the target definition are +evaluated is the configuration restricted to those variables (again, +with values unset in the original configuration set to `null`). + +For example, a library where the debug version has an additional +dependency could look as follows. + +``` jsonc +{ "libfoo": + { "type": ["@", "rules", "CC", "library"] + , "arguments_config": ["DEBUG"] + , "name": ["foo"] + , "hdrs": ["foo.hpp"] + , "srcs": ["foo.cpp"] + , "local defines": + { "type": "if" + , "cond": {"type": "var", "name": "DEBUG"} + , "then": ["DEBUG"] + } + , "deps": + { "type": "++" + , "$1": + [ ["libbar", "libbaz"] + , { "type": "if" + , "cond": {"type": "var", "name": "DEBUG"} + , "then": ["libdebuglog"] + } + ] + } + } +} +``` + +Effective configuration +----------------------- + +A target is influenced by the configuration through + + - the configuration dependency of target definition, as specified in + `"arguments_config"`, + - the configuration dependency of the underlying rule, as specified in + the rule's `"config_vars"` field, and + - the configuration dependency of target dependencies, not taking into + account values explicitly set by a configuration transition. + +Restricting the configuration to this collection of variables yields the +effective configuration for that target-configuration pair. The +`--dump-targets` option of the `analyse` subcommand allows to inspect +the effective configurations of all involved targets. Due to +configuration transitions, a target can be analyzed in more than one +configuration, e.g., if a library is used both, for a tool needed during +the build, as well as for the final binary cross-compiled for a +different target architecture. diff --git a/doc/concepts/configuration.org b/doc/concepts/configuration.org deleted file mode 100644 index 4217d22d..00000000 --- a/doc/concepts/configuration.org +++ /dev/null @@ -1,107 +0,0 @@ -* Configuration - -Targets describe abstract concepts like "library". Depending on -requirements, a library might manifest itself in different ways. -For example, -- it can be built for various target architectures, -- it can have the requirement to produce position-independent code, -- it can be a special build for debugging, profiling, etc. - -So, a target (like a library described by header files, source files, -dependencies, etc) has some additional input. As those inputs are -typically of a global nature (e.g., a profiling build usually wants -all involved libraries to be built for profiling), this additional -input, called "configuration" follows the same approach as the -~UNIX~ environment: it is a global collection of key-value pairs -and every target picks, what it needs. - -** Top-level configuration - -The configuration is a ~JSON~ object. The configuration for the -target requested can be specified on the command line using the -~-c~ option; its argument is a file name and that file is supposed -to contain the ~JSON~ object. - -** Propagation - -Rules and target definitions have to declare which parts of the -configuration they want to have access to. The (essentially) full -configuration, however, is passed on to the dependencies; in this way, -a target not using a part of the configuration can still depend on -it, if one of its dependencies does. - -*** Rules configuration and configuration transitions - -As part of the definition of a rule, it specifies a set ~"config_vars"~ -of variables. During the evaluation of the rule, the configuration -restricted to those variables (variables unset in the original -configuration are set to ~null~) is used as environment. - -Additionally, the rule can request that certain targets be evaluated -in a modified configuration by specifying ~"config_transitions"~ -accordingly. Typically, this is done when a tool is required during -the build; then this tool has to be built for the architecture on -which the build is carried out and not the target architecture. Those -tools often are ~"implicit"~ dependencies, i.e., dependencies that -every target defined by that rule has, without the need to specify -it in the target definition. - -*** Target configuration - -Additionally (and independently of the configuration-dependency -of the rule), the target definition itself can depend on the -configuration. This can happen, if a debug version of a library -has additional dependencies (e.g., for structured debug logs). - -If such a configuration-dependency is needed, the reserved key -word ~"arguments_config"~ is used to specify a set of variables (if -unset, the empty set is assumed; this should be the usual case). -The environment in which all arguments of the target definition are -evaluated is the configuration restricted to those variables (again, -with values unset in the original configuration set to ~null~). - -For example, a library where the debug version has an additional -dependency could look as follows. -#+BEGIN_SRC -{ "libfoo": - { "type": ["@", "rules", "CC", "library"] - , "arguments_config": ["DEBUG"] - , "name": ["foo"] - , "hdrs": ["foo.hpp"] - , "srcs": ["foo.cpp"] - , "local defines": - { "type": "if" - , "cond": {"type": "var", "name": "DEBUG"} - , "then": ["DEBUG"] - } - , "deps": - { "type": "++" - , "$1": - [ ["libbar", "libbaz"] - , { "type": "if" - , "cond": {"type": "var", "name": "DEBUG"} - , "then": ["libdebuglog"] - } - ] - } - } -} -#+END_SRC - -** Effective configuration - -A target is influenced by the configuration through -- the configuration dependency of target definition, as specified - in ~"arguments_config"~, -- the configuration dependency of the underlying rule, as specified - in the rule's ~"config_vars"~ field, and -- the configuration dependency of target dependencies, not taking - into account values explicitly set by a configuration transition. -Restricting the configuration to this collection of variables yields -the effective configuration for that target-configuration pair. -The ~--dump-targets~ option of the ~analyse~ subcommand allows to -inspect the effective configurations of all involved targets. Due to -configuration transitions, a target can be analyzed in more than one -configuration, e.g., if a library is used both, for a tool needed -during the build, as well as for the final binary cross-compiled -for a different target architecture. diff --git a/doc/concepts/doc-strings.md b/doc/concepts/doc-strings.md new file mode 100644 index 00000000..a1a156ac --- /dev/null +++ b/doc/concepts/doc-strings.md @@ -0,0 +1,152 @@ +Documentation of build rules, expressions, etc +============================================== + +Build rules can obtain a non-trivial complexity. This is especially true +if several rules have to exist for slightly different use cases, or if +the rule supports many different fields. Therefore, documentation of the +rules (and also expressions for the benefit of rule authors) is +desirable. + +Experience shows that documentation that is not versioned together with +the code it refers to quickly gets out of date, or lost. Therefore, we +add documentation directly into the respective definitions. + +Multi-line strings in JSON +-------------------------- + +In JSON, the newline character is encoded specially and not taken +literally; also, there is not implicit joining of string literals. So, +in order to also have documentation readable in the JSON representation +itself, instead of single strings, we take arrays of strings, with the +understanding that they describe the strings obtained by joining the +entries with newline characters. + +Documentation is optional +------------------------- + +While documentation is highly recommended, it still remains optional. +Therefore, when in the following we state that a key is for a list or a +map, it is always implied that it may be absent; in this case, the empty +array or the empty map is taken as default, respectively. + +Rules +----- + +Each rule is described as a JSON object with a fixed set of keys. So +having fixed keys for documentation does not cause conflicts. More +precisely, the keys `doc`, `field doc`, `config_doc`, `artifacts_doc`, +`runfiles_doc`, and `provides_doc` are reserved for documentation. Here, +`doc` has to be a list of strings describing the rule in general. +`field doc` has to be a map from (some of) the field names to an array +of strings, containing additional information on that particular field. +`config_doc` has to be a map from (some of) the config variables to an +array of strings describing the respective variable. `artifacts_doc` is +an array of strings describing the artifacts produced by the rule. +`runfiles_doc` is an array of strings describing the runfiles produced +by this rule. Finally, `provides_doc` is a map describing (some of) the +providers by that rule; as opposed to fields or config variables there +is no authoritative list of providers given elsewhere in the rule, so it +is up to the rule author to give an accurate documentation on the +provided data. + +### Example + +``` jsonc +{ "library": + { "doc": + [ "A C library" + , "" + , "Define a library that can be used to be statically linked to a" + , "binary. To do so, the target can simply be specified in the deps" + , "field of a binary; it can also be a dependency of another library" + , "and the information is then propagated to the corresponding binary." + ] + , "string_fields": ["name"] + , "target_fields": ["srcs", "hdrs", "private-hdrs", "deps"] + , "field_doc": + { "name": + ["The base name of the library (i.e., the name without the leading lib)."] + , "srcs": ["The source files (i.e., *.c files) of the library."] + , "hdrs": + [ "The public header files of this library. Targets depending on" + , "this library will have access to those header files" + ] + , "private-hdrs": + [ "Additional internal header files that are used when compiling" + , "the source files. Targets depending on this library have no access" + , "to those header files." + ] + , "deps": + [ "Any other libraries that this library uses. The dependency is" + , "also propagated (via the link-deps provider) to any consumers of" + , "this target. So only direct dependencies should be declared." + ] + } + , "config_vars": ["CC"] + , "config_doc": + { "CC": + [ "single string. defaulting to \"cc\", specifying the compiler" + , "to be used. The compiler is also used to launch the preprocessor." + ] + } + , "artifacts_doc": + ["The actual library (libname.a) staged in the specified directory"] + , "runfiles_doc": ["The public headers of this library"] + , "provides_doc": + { "compile-deps": + [ "Map of artifacts specifying any additional files that, besides the runfiles," + , "have to be present in compile actions of targets depending on this library" + ] + , "link-deps": + [ "Map of artifacts specifying any additional files that, besides the artifacts," + , "have to be present in a link actions of targets depending on this library" + ] + , "link-args": + [ "List of strings that have to be added to the command line for linking actions" + , "in targets depending on this library" + ] + } + , "expression": { ... } + } +} +``` + +Expressions +----------- + +Expressions are also described by a JSON object with a fixed set of +keys. Here we use the keys `doc` and `vars_doc` for documentation, where +`doc` is an array of strings describing the expression as a whole and +`vars_doc` is a map from (some of) the `vars` to an array of strings +describing this variable. + +Export targets +-------------- + +As export targets play the role of interfaces between repositories, it +is important that they be documented as well. Again, export targets are +described as a JSON object with fixed set of keys amd we use the keys +`doc` and `config_doc` for documentation. Here `doc` is an array of +strings describing the targeted in general and `config_doc` is a map +from (some of) the variables of the `flexible_config` to an array of +strings describing this parameter. + +Presentation of the documentation +--------------------------------- + +As all documentation are just values (that need not be evaluated) in +JSON objects, it is easy to write tool rendering documentation pages for +rules, etc, and we expect those tools to be written independently. +Nevertheless, for the benefit of developers using rules from a git-tree +roots that might not be checked out, there is a subcommand `describe` +which takes a target specification like the `analyze` command, looks up +the corresponding rule and describes it fully, i.e., prints in +human-readable form + + - the documentation for the rule + - all the fields available for that rule together with + - their type (`string_field`, `target_field`, etc), and + - their documentation, + - all the configuration variables of the rule with their documentation + (if given), and + - the documented providers. diff --git a/doc/concepts/doc-strings.org b/doc/concepts/doc-strings.org deleted file mode 100644 index d9a94dc5..00000000 --- a/doc/concepts/doc-strings.org +++ /dev/null @@ -1,145 +0,0 @@ -* Documentation of build rules, expressions, etc - -Build rules can obtain a non-trivial complexity. This is especially -true if several rules have to exist for slightly different use -cases, or if the rule supports many different fields. Therefore, -documentation of the rules (and also expressions for the benefit -of rule authors) is desirable. - -Experience shows that documentation that is not versioned together with -the code it refers to quickly gets out of date, or lost. Therefore, -we add documentation directly into the respective definitions. - -** Multi-line strings in JSON - -In JSON, the newline character is encoded specially and not taken -literally; also, there is not implicit joining of string literals. -So, in order to also have documentation readable in the JSON -representation itself, instead of single strings, we take arrays -of strings, with the understanding that they describe the strings -obtained by joining the entries with newline characters. - -** Documentation is optional - -While documentation is highly recommended, it still remains optional. -Therefore, when in the following we state that a key is for a list -or a map, it is always implied that it may be absent; in this case, -the empty array or the empty map is taken as default, respectively. - -** Rules - -Each rule is described as a JSON object with a fixed set of keys. -So having fixed keys for documentation does not cause conflicts. -More precisely, the keys ~doc~, ~field doc~, ~config_doc~, -~artifacts_doc~, ~runfiles_doc~, and ~provides_doc~ -are reserved for documentation. Here, ~doc~ has to be a list of -strings describing the rule in general. ~field doc~ has to be a map -from (some of) the field names to an array of strings, containing -additional information on that particular field. ~config_doc~ has -to be a map from (some of) the config variables to an array of -strings describing the respective variable. ~artifacts_doc~ is -an array of strings describing the artifacts produced by the rule. -~runfiles_doc~ is an array of strings describing the runfiles produced -by this rule. Finally, ~provides_doc~ is a map describing (some -of) the providers by that rule; as opposed to fields or config -variables there is no authoritative list of providers given elsewhere -in the rule, so it is up to the rule author to give an accurate -documentation on the provided data. - -*** Example - -#+BEGIN_SRC -{ "library": - { "doc": - [ "A C library" - , "" - , "Define a library that can be used to be statically linked to a" - , "binary. To do so, the target can simply be specified in the deps" - , "field of a binary; it can also be a dependency of another library" - , "and the information is then propagated to the corresponding binary." - ] - , "string_fields": ["name"] - , "target_fields": ["srcs", "hdrs", "private-hdrs", "deps"] - , "field_doc": - { "name": - ["The base name of the library (i.e., the name without the leading lib)."] - , "srcs": ["The source files (i.e., *.c files) of the library."] - , "hdrs": - [ "The public header files of this library. Targets depending on" - , "this library will have access to those header files" - ] - , "private-hdrs": - [ "Additional internal header files that are used when compiling" - , "the source files. Targets depending on this library have no access" - , "to those header files." - ] - , "deps": - [ "Any other libraries that this library uses. The dependency is" - , "also propagated (via the link-deps provider) to any consumers of" - , "this target. So only direct dependencies should be declared." - ] - } - , "config_vars": ["CC"] - , "config_doc": - { "CC": - [ "single string. defaulting to \"cc\", specifying the compiler" - , "to be used. The compiler is also used to launch the preprocessor." - ] - } - , "artifacts_doc": - ["The actual library (libname.a) staged in the specified directory"] - , "runfiles_doc": ["The public headers of this library"] - , "provides_doc": - { "compile-deps": - [ "Map of artifacts specifying any additional files that, besides the runfiles," - , "have to be present in compile actions of targets depending on this library" - ] - , "link-deps": - [ "Map of artifacts specifying any additional files that, besides the artifacts," - , "have to be present in a link actions of targets depending on this library" - ] - , "link-args": - [ "List of strings that have to be added to the command line for linking actions" - , "in targets depending on this library" - ] - } - , "expression": { ... } - } -} -#+END_SRC - -** Expressions - -Expressions are also described by a JSON object with a fixed set of -keys. Here we use the keys ~doc~ and ~vars_doc~ for documentation, -where ~doc~ is an array of strings describing the expression as a -whole and ~vars_doc~ is a map from (some of) the ~vars~ to an array -of strings describing this variable. - -** Export targets - -As export targets play the role of interfaces between repositories, -it is important that they be documented as well. Again, export targets -are described as a JSON object with fixed set of keys amd we use -the keys ~doc~ and ~config_doc~ for documentation. Here ~doc~ is an -array of strings describing the targeted in general and ~config_doc~ -is a map from (some of) the variables of the ~flexible_config~ to -an array of strings describing this parameter. - -** Presentation of the documentation - -As all documentation are just values (that need not be evaluated) -in JSON objects, it is easy to write tool rendering documentation -pages for rules, etc, and we expect those tools to be written -independently. Nevertheless, for the benefit of developers using -rules from a git-tree roots that might not be checked out, there is -a subcommand ~describe~ which takes a target specification like the -~analyze~ command, looks up the corresponding rule and describes -it fully, i.e., prints in human-readable form -- the documentation for the rule -- all the fields available for that rule together with - - their type (~string_field~, ~target_field~, etc), and - - their documentation, -- all the configuration variables of the rule with their - documentation (if given), and -- the documented providers. diff --git a/doc/concepts/expressions.md b/doc/concepts/expressions.md new file mode 100644 index 00000000..9e8a8f36 --- /dev/null +++ b/doc/concepts/expressions.md @@ -0,0 +1,368 @@ +Expression language +=================== + +At various places, in particular in order to define a rule, we need a +restricted form of functional computation. This is achieved by our +expression language. + +Syntax +------ + +All expressions are given by JSON values. One can think of expressions +as abstract syntax trees serialized to JSON; nevertheless, the precise +semantics is given by the evaluation mechanism described later. + +Semantic Values +--------------- + +Expressions evaluate to semantic values. Semantic values are JSON values +extended by additional atomic values for build-internal values like +artifacts, names, etc. + +### Truth + +Every value can be treated as a boolean condition. We follow a +convention similar to `LISP` considering everything true that is not +empty. More precisely, the values + + - `null`, + - `false`, + - `0`, + - `""`, + - the empty map, and + - the empty list + +are considered logically false. All other values are logically true. + +Evaluation +---------- + +The evaluation follows a strict, functional, call-by-value evaluation +mechanism; the precise evaluation is as follows. + + - Atomic values (`null`, booleans, strings, numbers) evaluate to + themselves. + - For lists, each entry is evaluated in the order they occur in the + list; the result of the evaluation is the list of the results. + - For JSON objects (wich can be understood as maps, or dicts), the key + `"type"` has to be present and has to be a literal string. That + string determines the syntactical construct (sloppily also referred + to as "function") the object represents, and the remaining + evaluation depends on the syntactical construct. The syntactical + construct has to be either one of the built-in ones or a special + function available in the given context (e.g., `"ACTION"` within the + expression defining a rule). + +All evaluation happens in an "environment" which is a map from strings +to semantic values. + +### Built-in syntactical constructs + +#### Special forms + +##### Variables: `"var"` + +There has to be a key `"name"` that (i.e., the expression in the +object at that key) has to be a literal string, taken as +variable name. If the variable name is in the domain of the +environment and the value of the environment at the variable +name is non-`null`, then the result of the evaluation is the +value of the variable in the environment. + +Otherwise, the key `"default"` is taken (if present, otherwise +the value `null` is taken as default for `"default"`) and +evaluated. The value obtained this way is the result of the +evaluation. + +##### Sequential binding: `"let*"` + +The key `"bindings"` (default `[]`) has to be (syntactically) a +list of pairs (i.e., lists of length two) with the first +component a literal string. + +For each pair in `"bindings"` the second component is evaluated, +in the order the pairs occur. After each evaluation, a new +environment is taken for the subsequent evaluations; the new +environment is like the old one but amended at the position +given by the first component of the pair to now map to the value +just obtained. + +Finally, the `"body"` is evaluated in the final environment +(after evaluating all binding entries) and the result of +evaluating the `"body"` is the value for the whole `"let*"` +expression. + +##### Environment Map: `"env"` + +Creates a map from selected environment variables. + +The key `"vars"` (default `[]`) has to be a list of literal +strings referring to the variable names that should be included +in the produced map. This field is not evaluated. This +expression is only for convenience and does not give new +expression power. It is equivalent but lot shorter to multiple +`singleton_map` expressions combined with `map_union`. + +##### Conditionals + +###### Binary conditional: `"if"` + +First the key `"cond"` is evaluated. If it evaluates to a +value that is logically true, then the key `"then"` is +evaluated and its value is the result of the evaluation. +Otherwise, the key `"else"` (if present, otherwise `[]` is +taken as default) is evaluated and the obtained value is the +result of the evaluation. + +###### Sequential conditional: `"cond"` + +The key `"cond"` has to be a list of pairs. In the order of +the list, the first components of the pairs are evaluated, +until one evaluates to a value that is logically true. For +that pair, the second component is evaluated and the result +of this evaluation is the result of the `"cond"` expression. + +If all first components evaluate to a value that is +logically false, the result of the expression is the result +of evaluating the key `"default"` (defaulting to `[]`). + +###### String case distinction: `"case"` + +If the key `"case"` is present, it has to be a map (an +"object", in JSON's terminology). In this case, the key +`"expr"` is evaluated; it has to evaluate to a string. If +the value is a key in the `"case"` map, the expression at +this key is evaluated and the result of that evaluation is +the value for the `"case"` expression. + +Otherwise (i.e., if `"case"` is absent or `"expr"` evaluates +to a string that is not a key in `"case"`), the key +`"default"` (with default `[]`) is evaluated and this gives +the result of the `"case"` expression. + +###### Sequential case distinction on arbitrary values: `"case*"` + +If the key `"case"` is present, it has to be a list of +pairs. In this case, the key `"expr"` is evaluated. It is an +error if that evaluates to a name-containing value. The +result of that evaluation is sequentially compared to the +evaluation of the first components of the `"case"` list +until an equal value is found. In this case, the evaluation +of the second component of the pair is the value of the +`"case*"` expression. + +If the `"case"` key is absent, or no equality is found, the +result of the `"case*"` expression is the result of +evaluating the `"default"` key (with default `[]`). + +##### Conjunction and disjunction: `"and"` and `"or"` + +For conjunction, if the key `"$1"` (with default `[]`) is +syntactically a list, its entries are sequentially evaluated +until a logically false value is found; in that case, the result +is `false`, otherwise true. If the key `"$1"` has a different +shape, it is evaluated and has to evaluate to a list. The result +is the conjunction of the logical values of the entries. In +particular, `{"type": "and"}` evaluates to `true`. + +For disjunction, the evaluation mechanism is the same, but the +truth values and connective are taken dually. So, `"and"` and +`"or"` are logical conjunction and disjunction, respectively, +using short-cut evaluation if syntactically admissible (i.e., if +the argument is syntactically a list). + +##### Mapping + +###### Mapping over lists: `"foreach"` + +First the key `"range"` is evaluated and has to evaluate to +a list. For each entry of this list, the expression `"body"` +is evaluated in an environment that is obtained from the +original one by setting the value for the variable specified +at the key `"var"` (which has to be a literal string, +default `"_"`) to that value. The result is the list of +those evaluation results. + +###### Mapping over maps: `"foreach_map"` + +Here, `"range"` has to evaluate to a map. For each entry (in +lexicographic order (according to native byte order) by +keys), the expression `"body"` is evaluated in an +environment obtained from the original one by setting the +variables specified at `"var_key"` and `"var_val"` (literal +strings, default values `"_"` and `"$_"`, respectively). The +result of the evaluation is the list of those values. + +##### Folding: `"foldl"` + +The key `"range"` is evaluated and has to evaluate to a list. +Starting from the result of evaluating `"start"` (default `[]`) +a new value is obtained for each entry of the range list by +evaluating `"body"` in an environment obtained from the original +by binding the variable specified by `"var"` (literal string, +default `"_"`) to the list entry and the variable specified by +`"accum_var"` (literal string, default value `"$1"`) to the old +value. The result is the last value obtained. + +#### Regular functions + +First `"$1"` is evaluated; for binary functions `"$2"` is evaluated +next. For functions that accept keyword arguments, those are +evaluated as well. Finally the function is applied to this (or +those) argument(s) to obtain the final result. + +##### Unary functions + + - `"nub_right"` The argument has to be a list. It is an error + if that list contains (directly or indirectly) a name. The + result is the input list, except that for all duplicate + values, all but the rightmost occurrence is removed. + + - `"basename"` The argument has to be a string. This string is + interpreted as a path, and the file name thereof is + returned. + + - `"keys"` The argument has to be a map. The result is the + list of keys of this map, in lexicographical order + (according to native byte order). + + - `"values"` The argument has to be a map. The result are the + values of that map, ordered by the corresponding keys + (lexicographically according to native byte order). + + - `"range"` The argument is interpreted as a non-negative + integer as follows. Non-negative numbers are rounded to the + nearest integer; strings have to be the decimal + representation of an integer; everything else is considered + zero. The result is a list of the given length, consisting + of the decimal representations of the first non-negative + integers. For example, `{"type": "range", + "$1": "3"}` evaluates to `["0", "1", "2"]`. + + - `"enumerate"` The argument has to be a list. The result is a + map containing one entry for each element of the list. The + key is the decimal representation of the position in the + list (starting from `0`), padded with leading zeros to + length at least 10. The value is the element. The padding is + chosen in such a way that iterating over the resulting map + (which happens in lexicographic order of the keys) has the + same iteration order as the list for all lists indexable by + 32-bit integers. + + - `"++"` The argument has to be a list of lists. The result is + the concatenation of those lists. + + - `"map_union"` The argument has to be a list of maps. The + result is a map containing as keys the union of the keys of + the maps in that list. For each key, the value is the value + of that key in the last map in the list that contains that + key. + + - `"join_cmd"` The argument has to be a list of strings. A + single string is returned that quotes the original vector in + a way understandable by a POSIX shell. As the command for an + action is directly given by an argument vector, `"join_cmd"` + is typically only used for generated scripts. + + - `"json_encode"` The result is a single string that is the + canonical JSON encoding of the argument (with minimal white + space); all atomic values that are not part of JSON (i.e., + the added atomic values to represent build-internal values) + are serialized as `null`. + +##### Unary functions with keyword arguments + + - `"change_ending"` The argument has to be a string, + interpreted as path. The ending is replaced by the value of + the keyword argument `"ending"` (a string, default `""`). + For example, `{"type": + "change_ending", "$1": "foo/bar.c", "ending": ".o"}` + evaluates to `"foo/bar.o"`. + + - `"join"` The argument has to be a list of strings. The + return value is the concatenation of those strings, + separated by the the specified `"separator"` (strings, + default `""`). + + - `"escape_chars"` Prefix every in the argument every + character occuring in `"chars"` (a string, default `""`) by + `"escape_prefix"` (a strings, default `"\"`). + + - `"to_subdir"` The argument has to be a map (not necessarily + of artifacts). The keys as well as the `"subdir"` (string, + default `"."`) argument are interpreted as paths and keys + are replaced by the path concatenation of those two paths. + If the optional argument `"flat"` (default `false`) + evaluates to a true value, the keys are instead replaced by + the path concatenation of the `"subdir"` argument and the + base name of the old key. It is an error if conflicts occur + in this way; in case of such a user error, the argument + `"msg"` is also evaluated and the result of that evaluation + reported in the error message. Note that conflicts can also + occur in non-flat staging if two keys are different as + strings, but name the same path (like `"foo.txt"` and + `"./foo.txt"`), and are assigned different values. It also + is an error if the values for keys in conflicting positions + are name-containing. + +##### Binary functions + + - `"=="` The result is `true` is the arguments are equal, + `false` otherwise. It is an error if one of the arguments + are name-containing values. + + - `"concat_target_name"` This function is only present to + simplify transitions from some other build systems and + normally not used outside code generated by transition + tools. The second argument has to be a string or a list of + strings (in the latter case, it is treated as strings by + concatenating the entries). If the first argument is a + string, the result is the concatenation of those two + strings. If the first argument is a list of strings, the + result is that list with the second argument concatenated to + the last entry of that list (if any). + +##### Other functions + + - `"empty_map"` This function takes no arguments and always + returns an empty map. + + - `"singleton_map"` This function takes two keyword arguments, + `"key"` and `"value"` and returns a map with one entry, + mapping the given key to the given value. + + - `"lookup"` This function takes two keyword arguments, + `"key"` and `"map"`. The `"key"` argument has to evaluate to + a string and the `"map"` argument has to evaluate to a map. + If that map contains the given key and the corresponding + value is non-`null`, the value is returned. Otherwise the + `"default"` argument (with default `null`) is evaluated and + returned. + +#### Constructs related to reporting of user errors + +Normally, if an error occurs during the evaluation the error is +reported together with a stack trace. This, however, might not be +the most informative way to present a problem to the user, +especially if the underlying problem is a proper user error, e.g., +in rule usage (leaving out mandatory arguments, violating semantical +prerequisites, etc). To allow proper error reporting, the following +functions are available. All of them have an optional argument +`"msg"` that is evaluated (only) in case of error and the result of +that evaluation included in the error message presented to the user. + + - `"fail"` Evaluation of this function unconditionally fails. + + - `"context"` This function is only there to provide additional + information in case of error. Otherwise it is the identify + function (a unary function, i.e., the result of the evaluation + is the result of evaluating the argument `"$1"`). + + - `"assert_non_empty"` Evaluate the argument (given by the + parameter `"$1"`). If it evaluates to a non-empty string, map, + or list, return the result of the evaluation. Otherwise fail. + + - `"disjoint_map_union"` Like `"map_union"` but it is an error, if + two (or more) maps contain the same key, but map it to different + values. It is also an error if the argument is a name-containing + value. diff --git a/doc/concepts/expressions.org b/doc/concepts/expressions.org deleted file mode 100644 index ac66e878..00000000 --- a/doc/concepts/expressions.org +++ /dev/null @@ -1,344 +0,0 @@ -* Expression language - -At various places, in particular in order to define a rule, we need -a restricted form of functional computation. This is achieved by -our expression language. - -** Syntax - -All expressions are given by JSON values. One can think of expressions -as abstract syntax trees serialized to JSON; nevertheless, the precise -semantics is given by the evaluation mechanism described later. - -** Semantic Values - -Expressions evaluate to semantic values. Semantic values are JSON -values extended by additional atomic values for build-internal -values like artifacts, names, etc. - -*** Truth - -Every value can be treated as a boolean condition. We follow a -convention similar to ~LISP~ considering everything true that is -not empty. More precisely, the values -- ~null~, -- ~false~, -- ~0~, -- ~""~, -- the empty map, and -- the empty list -are considered logically false. All other values are logically true. - -** Evaluation - -The evaluation follows a strict, functional, call-by-value evaluation -mechanism; the precise evaluation is as follows. - -- Atomic values (~null~, booleans, strings, numbers) evaluate to - themselves. -- For lists, each entry is evaluated in the order they occur in the - list; the result of the evaluation is the list of the results. -- For JSON objects (wich can be understood as maps, or dicts), the - key ~"type"~ has to be present and has to be a literal string. - That string determines the syntactical construct (sloppily also - referred to as "function") the object represents, and the remaining - evaluation depends on the syntactical construct. The syntactical - construct has to be either one of the built-in ones or a special - function available in the given context (e.g., ~"ACTION"~ within - the expression defining a rule). - -All evaluation happens in an "environment" which is a map from -strings to semantic values. - -*** Built-in syntactical constructs - -**** Special forms - -***** Variables: ~"var"~ - -There has to be a key ~"name"~ that (i.e., the expression in the -object at that key) has to be a literal string, taken as variable -name. If the variable name is in the domain of the environment and -the value of the environment at the variable name is non-~null~, -then the result of the evaluation is the value of the variable in -the environment. - -Otherwise, the key ~"default"~ is taken (if present, otherwise the -value ~null~ is taken as default for ~"default"~) and evaluated. -The value obtained this way is the result of the evaluation. - -***** Sequential binding: ~"let*"~ - -The key ~"bindings"~ (default ~[]~) has to be (syntactically) a -list of pairs (i.e., lists of length two) with the first component -a literal string. - -For each pair in ~"bindings"~ the second component is evaluated, in -the order the pairs occur. After each evaluation, a new environment -is taken for the subsequent evaluations; the new environment is -like the old one but amended at the position given by the first -component of the pair to now map to the value just obtained. - -Finally, the ~"body"~ is evaluated in the final environment (after -evaluating all binding entries) and the result of evaluating the -~"body"~ is the value for the whole ~"let*"~ expression. - -***** Environment Map: ~"env"~ - -Creates a map from selected environment variables. - -The key ~"vars"~ (default ~[]~) has to be a list of literal strings referring to -the variable names that should be included in the produced map. This field is -not evaluated. This expression is only for convenience and does not give new -expression power. It is equivalent but lot shorter to multiple ~singleton_map~ -expressions combined with ~map_union~. - -***** Conditionals - -****** Binary conditional: ~"if"~ - -First the key ~"cond"~ is evaluated. If it evaluates to a value that -is logically true, then the key ~"then"~ is evaluated and its value -is the result of the evaluation. Otherwise, the key ~"else"~ (if -present, otherwise ~[]~ is taken as default) is evaluated and the -obtained value is the result of the evaluation. - -****** Sequential conditional: ~"cond"~ - -The key ~"cond"~ has to be a list of pairs. In the order of the -list, the first components of the pairs are evaluated, until one -evaluates to a value that is logically true. For that pair, the -second component is evaluated and the result of this evaluation is -the result of the ~"cond"~ expression. - -If all first components evaluate to a value that is logically false, -the result of the expression is the result of evaluating the key -~"default"~ (defaulting to ~[]~). - -****** String case distinction: ~"case"~ - -If the key ~"case"~ is present, it has to be a map (an "object", in -JSON's terminology). In this case, the key ~"expr"~ is evaluated; it -has to evaluate to a string. If the value is a key in the ~"case"~ -map, the expression at this key is evaluated and the result of that -evaluation is the value for the ~"case"~ expression. - -Otherwise (i.e., if ~"case"~ is absent or ~"expr"~ evaluates to a -string that is not a key in ~"case"~), the key ~"default"~ (with -default ~[]~) is evaluated and this gives the result of the ~"case"~ -expression. - -****** Sequential case distinction on arbitrary values: ~"case*"~ - -If the key ~"case"~ is present, it has to be a list of pairs. In this -case, the key ~"expr"~ is evaluated. It is an error if that evaluates -to a name-containing value. The result of that evaluation -is sequentially compared to the evaluation of the first components -of the ~"case"~ list until an equal value is found. In this case, -the evaluation of the second component of the pair is the value of -the ~"case*"~ expression. - -If the ~"case"~ key is absent, or no equality is found, the result of -the ~"case*"~ expression is the result of evaluating the ~"default"~ -key (with default ~[]~). - -***** Conjunction and disjunction: ~"and"~ and ~"or"~ - -For conjunction, if the key ~"$1"~ (with default ~[]~) is syntactically -a list, its entries are sequentially evaluated until a logically -false value is found; in that case, the result is ~false~, otherwise -true. If the key ~"$1"~ has a different shape, it is evaluated and -has to evaluate to a list. The result is the conjunction of the -logical values of the entries. In particular, ~{"type": "and"}~ -evaluates to ~true~. - -For disjunction, the evaluation mechanism is the same, but the truth -values and connective are taken dually. So, ~"and"~ and ~"or"~ are -logical conjunction and disjunction, respectively, using short-cut -evaluation if syntactically admissible (i.e., if the argument is -syntactically a list). - -***** Mapping - -****** Mapping over lists: ~"foreach"~ - -First the key ~"range"~ is evaluated and has to evaluate to a list. -For each entry of this list, the expression ~"body"~ is evaluated -in an environment that is obtained from the original one by setting -the value for the variable specified at the key ~"var"~ (which has -to be a literal string, default ~"_"~) to that value. The result -is the list of those evaluation results. - -****** Mapping over maps: ~"foreach_map"~ - -Here, ~"range"~ has to evaluate to a map. For each entry (in -lexicographic order (according to native byte order) by keys), the -expression ~"body"~ is evaluated in an environment obtained from -the original one by setting the variables specified at ~"var_key"~ -and ~"var_val"~ (literal strings, default values ~"_"~ and -~"$_"~, respectively). The result of the evaluation is the list of -those values. - -***** Folding: ~"foldl"~ - -The key ~"range"~ is evaluated and has to evaluate to a list. -Starting from the result of evaluating ~"start"~ (default ~[]~) a -new value is obtained for each entry of the range list by evaluating -~"body"~ in an environment obtained from the original by binding -the variable specified by ~"var"~ (literal string, default ~"_"~) to -the list entry and the variable specified by ~"accum_var"~ (literal -string, default value ~"$1"~) to the old value. The result is the -last value obtained. - -**** Regular functions - -First ~"$1"~ is evaluated; for binary functions ~"$2"~ is evaluated -next. For functions that accept keyword arguments, those are -evaluated as well. Finally the function is applied to this (or -those) argument(s) to obtain the final result. - -***** Unary functions - -- ~"nub_right"~ The argument has to be a list. It is an error if that list - contains (directly or indirectly) a name. The result is the - input list, except that for all duplicate values, all but the - rightmost occurrence is removed. - -- ~"basename"~ The argument has to be a string. This string is - interpreted as a path, and the file name thereof is returned. - -- ~"keys"~ The argument has to be a map. The result is the list of - keys of this map, in lexicographical order (according to native - byte order). - -- ~"values"~ The argument has to be a map. The result are the values - of that map, ordered by the corresponding keys (lexicographically - according to native byte order). - -- ~"range"~ The argument is interpreted as a non-negative integer as - follows. Non-negative numbers are rounded to the nearest integer; - strings have to be the decimal representation of an integer; - everything else is considered zero. The result is a list of the - given length, consisting of the decimal representations of the - first non-negative integers. For example, ~{"type": "range", - "$1": "3"}~ evaluates to ~["0", "1", "2"]~. - -- ~"enumerate"~ The argument has to be a list. The result is a map - containing one entry for each element of the list. The key is - the decimal representation of the position in the list (starting - from ~0~), padded with leading zeros to length at least 10. The - value is the element. The padding is chosen in such a way that - iterating over the resulting map (which happens in lexicographic - order of the keys) has the same iteration order as the list for - all lists indexable by 32-bit integers. - -- ~"++"~ The argument has to be a list of lists. The result is the - concatenation of those lists. - -- ~"map_union"~ The argument has to be a list of maps. The result - is a map containing as keys the union of the keys of the maps in - that list. For each key, the value is the value of that key in - the last map in the list that contains that key. - -- ~"join_cmd"~ The argument has to be a list of strings. A single - string is returned that quotes the original vector in a way - understandable by a POSIX shell. As the command for an action is - directly given by an argument vector, ~"join_cmd"~ is typically - only used for generated scripts. - -- ~"json_encode"~ The result is a single string that is the canonical - JSON encoding of the argument (with minimal white space); all atomic - values that are not part of JSON (i.e., the added atomic values - to represent build-internal values) are serialized as ~null~. - -***** Unary functions with keyword arguments - -- ~"change_ending"~ The argument has to be a string, interpreted as - path. The ending is replaced by the value of the keyword argument - ~"ending"~ (a string, default ~""~). For example, ~{"type": - "change_ending", "$1": "foo/bar.c", "ending": ".o"}~ evaluates - to ~"foo/bar.o"~. - -- ~"join"~ The argument has to be a list of strings. The return - value is the concatenation of those strings, separated by the - the specified ~"separator"~ (strings, default ~""~). - -- ~"escape_chars"~ Prefix every in the argument every character - occuring in ~"chars"~ (a string, default ~""~) by ~"escape_prefix"~ (a - strings, default ~"\\"~). - -- ~"to_subdir"~ The argument has to be a map (not necessarily of - artifacts). The keys as well as the ~"subdir"~ (string, default - ~"."~) argument are interpreted as paths and keys are replaced - by the path concatenation of those two paths. If the optional - argument ~"flat"~ (default ~false~) evaluates to a true value, - the keys are instead replaced by the path concatenation of the - ~"subdir"~ argument and the base name of the old key. It is an - error if conflicts occur in this way; in case of such a user - error, the argument ~"msg"~ is also evaluated and the result - of that evaluation reported in the error message. Note that - conflicts can also occur in non-flat staging if two keys are - different as strings, but name the same path (like ~"foo.txt"~ - and ~"./foo.txt"~), and are assigned different values. - It also is an error if the values for keys in conflicting positions - are name-containing. - -***** Binary functions - -- ~"=="~ The result is ~true~ is the arguments are equal, ~false~ - otherwise. It is an error if one of the arguments are name-containing - values. - -- ~"concat_target_name"~ This function is only present to simplify - transitions from some other build systems and normally not used - outside code generated by transition tools. The second argument - has to be a string or a list of strings (in the latter case, - it is treated as strings by concatenating the entries). If the - first argument is a string, the result is the concatenation of - those two strings. If the first argument is a list of strings, - the result is that list with the second argument concatenated to - the last entry of that list (if any). - -***** Other functions - -- ~"empty_map"~ This function takes no arguments and always returns - an empty map. - -- ~"singleton_map"~ This function takes two keyword arguments, - ~"key"~ and ~"value"~ and returns a map with one entry, mapping - the given key to the given value. - -- ~"lookup"~ This function takes two keyword arguments, ~"key"~ - and ~"map"~. The ~"key"~ argument has to evaluate to a string - and the ~"map"~ argument has to evaluate to a map. If that map - contains the given key and the corresponding value is non-~null~, - the value is returned. Otherwise the ~"default"~ argument (with - default ~null~) is evaluated and returned. - -**** Constructs related to reporting of user errors - -Normally, if an error occurs during the evaluation the error is -reported together with a stack trace. This, however, might not -be the most informative way to present a problem to the user, -especially if the underlying problem is a proper user error, e.g., -in rule usage (leaving out mandatory arguments, violating semantical -prerequisites, etc). To allow proper error reporting, the following -functions are available. All of them have an optional argument -~"msg"~ that is evaluated (only) in case of error and the result of -that evaluation included in the error message presented to the user. - -- ~"fail"~ Evaluation of this function unconditionally fails. - -- ~"context"~ This function is only there to provide additional - information in case of error. Otherwise it is the identify - function (a unary function, i.e., the result of the evaluation - is the result of evaluating the argument ~"$1"~). - -- ~"assert_non_empty"~ Evaluate the argument (given by the parameter - ~"$1"~). If it evaluates to a non-empty string, map, or list, - return the result of the evaluation. Otherwise fail. - -- ~"disjoint_map_union"~ Like ~"map_union"~ but it is an error, - if two (or more) maps contain the same key, but map it to - different values. It is also an error if the argument is a - name-containing value. diff --git a/doc/concepts/garbage.md b/doc/concepts/garbage.md new file mode 100644 index 00000000..69594b1c --- /dev/null +++ b/doc/concepts/garbage.md @@ -0,0 +1,86 @@ +Garbage Collection +================== + +For every build, for all non-failed actions an entry is created in the +action cache and the corresponding artifacts are stored in the CAS. So, +over time, a lot of files accumulate in the local build root. Hence we +have a way to reclaim disk space while keeping the benefits of having a +cache. This operation is referred to as garbage collection and usually +uses the heuristics to keeping what is most recently used. Our approach +follows this paradigm as well. + +Invariants assumed by our build system +-------------------------------------- + +Our tool assumes several invariants on the local build root, that we +need to maintain during garbage collection. Those are the following. + + - If an artifact is referenced in any cache entry (action cache, + target-level cache), then the corresponding artifact is in CAS. + - If a tree is in CAS, then so are its immediate parts (and hence also + all transitive parts). + +Generations of cache and CAS +---------------------------- + +In order to allow garbage collection while keeping the desired +invariants, we keep several (currently two) generations of cache and +CAS. Each generation in itself has to fulfill the invariants. The +effective cache or CAS is the union of the caches or CASes of all +generations, respectively. Obviously, then the effective cache and CAS +fulfill the invariants as well. + +The actual `gc` command rotates the generations: the oldest generation +is be removed and the remaining generations are moved one number up +(i.e., currently the young generation will simply become the old +generation), implicitly creating a new, empty, youngest generation. As +an empty generation fulfills the required invariants, this operation +preservers the requirement that each generation individually fulfill the +invariants. + +All additions are made to the youngest generation; in order to keep the +invariant, relevant entries only present in an older generation are also +added to the youngest generation first. Moreover, whenever an entry is +referenced in any way (cache hit, request for an entry to be in CAS) and +is only present in an older generation, it is also added to the younger +generation, again adding referenced parts first. As a consequence, the +youngest generation contains everything directly or indirectly +referenced since the last garbage collection; in particular, everything +referenced since the last garbage collection will remain in the +effective cache or CAS upon the next garbage collection. + +These generations are stored as separate directories inside the local +build root. As the local build root is, starting from an empty +directory, entirely managed by \`just\` and compatible tools, +generations are on the same file system. Therefore the adding of old +entries to the youngest generation can be implemented in an efficient +way by using hard links. + +The moving up of generations can happen atomically by renaming the +respective directory. Also, the oldest generation can be removed +logically by renaming a directory to a name that is not searched for +when looking for existing generations. The actual recursive removal from +the file system can then happen in a separate step without any +requirements on order. + +Parallel operations in the presence of garbage collection +--------------------------------------------------------- + +The addition to cache and CAS can continue to happen in parallel; that +certain values are taken from an older generation instead of freshly +computed does not make a difference for the youngest generation (which +is the only generation modified). But build processes assume they don't +violate the invariant if they first add files to CAS and later a tree or +cache entry referencing them. This, however, only holds true if no +generation rotation happens in between. To avoid those kind of races, we +make processes coordinate over a single lock for each build root. + + - Any build process keeps a shared lock for the entirety of the build. + - The garbage collection process takes an exclusive lock for the + period it does the directory renames. + +We consider it acceptable that, in theory, local build processes could +starve local garbage collection. Moreover, it should be noted that the +actual removal of no-longer-needed files from the file system happens +without any lock being held. Hence the disturbance of builds caused by +garbage collection is small. diff --git a/doc/concepts/garbage.org b/doc/concepts/garbage.org deleted file mode 100644 index 26f6cc51..00000000 --- a/doc/concepts/garbage.org +++ /dev/null @@ -1,82 +0,0 @@ -* Garbage Collection - -For every build, for all non-failed actions an entry is created in -the action cache and the corresponding artifacts are stored in the -CAS. So, over time, a lot of files accumulate in the local build -root. Hence we have a way to reclaim disk space while keeping the -benefits of having a cache. This operation is referred to as garbage -collection and usually uses the heuristics to keeping what is most -recently used. Our approach follows this paradigm as well. - -** Invariants assumed by our build system - -Our tool assumes several invariants on the local build root, that we -need to maintain during garbage collection. Those are the following. -- If an artifact is referenced in any cache entry (action cache, - target-level cache), then the corresponding artifact is in CAS. -- If a tree is in CAS, then so are its immediate parts (and hence - also all transitive parts). - - -** Generations of cache and CAS - -In order to allow garbage collection while keeping the desired -invariants, we keep several (currently two) generations of cache -and CAS. Each generation in itself has to fulfill the invariants. -The effective cache or CAS is the union of the caches or CASes of -all generations, respectively. Obviously, then the effective cache -and CAS fulfill the invariants as well. - -The actual ~gc~ command rotates the generations: the oldest -generation is be removed and the remaining generations are moved -one number up (i.e., currently the young generation will simply -become the old generation), implicitly creating a new, empty, -youngest generation. As an empty generation fulfills the required -invariants, this operation preservers the requirement that each -generation individually fulfill the invariants. - -All additions are made to the youngest generation; in order to keep -the invariant, relevant entries only present in an older generation -are also added to the youngest generation first. Moreover, whenever -an entry is referenced in any way (cache hit, request for an entry -to be in CAS) and is only present in an older generation, it is -also added to the younger generation, again adding referenced -parts first. As a consequence, the youngest generation contains -everything directly or indirectly referenced since the last garbage -collection; in particular, everything referenced since the last -garbage collection will remain in the effective cache or CAS upon -the next garbage collection. - -These generations are stored as separate directories inside the -local build root. As the local build root is, starting from an -empty directory, entirely managed by `just` and compatible tools, -generations are on the same file system. Therefore the adding of -old entries to the youngest generation can be implemented in an -efficient way by using hard links. - -The moving up of generations can happen atomically by renaming the -respective directory. Also, the oldest generation can be removed -logically by renaming a directory to a name that is not searched -for when looking for existing generations. The actual recursive -removal from the file system can then happen in a separate step -without any requirements on order. - -** Parallel operations in the presence of garbage collection - -The addition to cache and CAS can continue to happen in parallel; -that certain values are taken from an older generation instead -of freshly computed does not make a difference for the youngest -generation (which is the only generation modified). But build -processes assume they don't violate the invariant if they first -add files to CAS and later a tree or cache entry referencing them. -This, however, only holds true if no generation rotation happens in -between. To avoid those kind of races, we make processes coordinate -over a single lock for each build root. -- Any build process keeps a shared lock for the entirety of the build. -- The garbage collection process takes an exclusive lock for the - period it does the directory renames. -We consider it acceptable that, in theory, local build processes -could starve local garbage collection. Moreover, it should be noted -that the actual removal of no-longer-needed files from the file -system happens without any lock being held. Hence the disturbance -of builds caused by garbage collection is small. diff --git a/doc/concepts/multi-repo.md b/doc/concepts/multi-repo.md new file mode 100644 index 00000000..c465360e --- /dev/null +++ b/doc/concepts/multi-repo.md @@ -0,0 +1,170 @@ +Multi-repository build +====================== + +Repository configuration +------------------------ + +### Open repository names + +A repository can have external dependencies. This is realized by having +unbound ("open") repository names being used as references. The actual +definition of those external repositories is not part of the repository; +we think of them as inputs, i.e., we think of this repository as a +function of the referenced external targets. + +### Binding in a separate repository configuration + +The actual binding of the free repository names is specified in a +separate repository-configuration file, which is specified on the +command line (via the `-C` option); this command-line argument is +optional and the default is that the repository worked on has no +external dependencies. Typically (but not necessarily), this +repository-configuration file is located outside the referenced +repositories and versioned separately or generated from such a file via +`bin/just-mr.py`. It serves as meta-data for a group of repositories +belonging together. + +This file contains one JSON object. For the key `"repositories"` the +value is an object; its keys are the global names of the specified +repositories. For each repository, there is an object describing it. The +key `"workspace_root"` describes where to find the repository and should +be present for all (direct or indirect) external dependencies of the +repository worked upon. Additional roots file names (for target, rule, +and expression) can be specified. For keys not given, the same rules for +default values apply as for the corresponding command-line arguments. +Additionally, for each repository, the key "bindings" specifies the +map of the open repository names to the global names that provide these +dependencies. Repositories may depend on each other (or even +themselves), but the resulting global target graph has to be cycle free. + +Whenever a location has to be specified, the value has to be a list, +with the first entry being specifying the naming scheme; the semantics +of the remaining entries depends on the scheme (see "Root Naming +Schemes" below). + +Additionally, the key `"main"` (with default `""`) specifies the main +repository. The target to be built (as specified on the command line) is +taken from this repository. Also, the command-line arguments `-w`, +`--target_root`, etc, apply to this repository. If no option `-w` is +given and `"workspace_root"` is not specified in the +repository-configuration file either, the root is determined from the +working directory as usual. + +The value of `main` can be overwritten on the command line (with the +`--main` option) In this way, a consistent configuration of +interdependent repositories can be versioned and referred to regardless +of the repository worked on. + +#### Root naming scheme + +##### `"file"` + +The `"file"` scheme tells that the repository (or respective +root) can be found in a directory in the local file system; the +only argument is the absolute path to that directory. + +##### `"git tree"` + +The `"git tree"` scheme tells that the root is defined to be a +tree given by a git tree identifier. It takes two arguments + + - the tree identifier, as hex-encoded string, and + - the absolute path to some repository containing that tree + +#### Example + +Consider, for example, the following repository-configuration file. +In the following, we assume it is located at `/etc/just/repos.json`. + +``` jsonc +{ "main": "env" +, "repositories": + { "foobar": + { "workspace_root": ["file", "/opt/foobar/repo"] + , "rule_root": ["file", "/etc/just/rules"] + , "bindings": {"base": "barimpl"} + } + , "barimpl": + { "workspace_root": ["file", "/opt/barimpl"] + , "target_file_name": "TARGETS.bar" + } + , "env": {"bindings": {"foo": "foobar", "bar": "barimpl"}} + } +} +``` + +It specifies 3 repositories, with global names `foobar`, `barimpl`, +and `env`. Within `foobar`, the repository name `base` refers to +`barimpl`, the repository that can be found at `/opt/barimpl`. + +The repository `env` is the main repository and there is no +workspace root defined for it, so it only provides bindings for +external repositories `foo` and `bar`, but the actual repository is +taken from the working directory (unless `-w` is specified). In this +way, it provides an environment for developing applications based on +`foo` and `bar`. + +For example, the invocation `just build -C /etc/just/repos.conf +baz` tells our tool to build the target `baz` from the module the +working directory is located in. `foo` will refer to the repository +found at `/opt/foobar/repo` (using rules from `/etc/just/rules`, +taking `base` refer to the repository at `/opt/barimpl`) and `bar` +will refer to the repository at `/opts/barimpl`. + +Naming of targets +----------------- + +### Reference in target files + +In addition to the normal target references (string for a target in the +name module, module-target pair for a target in same repository, +`["./", relpath, target]` relative addressing, `["FILE", null, +name]` explicit file reference in the same module), references of the +form `["@", repo, module, target]` can be specified, where `repo` is +string referring to an open name. That open repository name is resolved +to the global name by the `"bindings"` parameter of the repository the +target reference is made in. Within the repository the resolved name +refers to, the target `[module, target]` is taken. + +### Expression language: names as abstract values + +Targets are a global concept as they distinguish targets from different +repositories. Their names, however, depend on the repository they occur +in (as the local names might differ in various repositories). Moreover, +some targets cannot be named in certain repositories as not every +repository has a local name in every other repository. + +To handle this naming problem, we note the following. During the +evaluation of a target names occur at two places: as the result of +evaluating the parameters (for target fields) and in the evaluation of +the defining expression when requesting properties of a target dependent +upon (via `DEP_ARTIFACTS` and related functions). In the later case, +however, the only legitimate way to obtain a target name is by the +`FIELD` function. To enforce this behavior, and to avoid problems with +serializing target names, our expression language considers target names +as opaque values. More precisely, + + - in a target description, the target fields are evaluated and the + result of the evaluation is parsed, in the context of the module the + `TARGET` file belongs to, as a target name, and + - during evaluation of the defining expression of a the target's + rule, when accessing `FIELD` the values of target fields will be + reported as abstract name values and when querying values of + dependencies (via `DEP_ARTIFACTS` etc) the correct abstract target + name has to be provided. + +While the defining expression has access to target names (via target +fields), it is not useful to provide them in provided data; a consuming +data cannot use names unless it has those fields as dependency anyway. +Our tool will not enforce this policy; however, only targets not having +names in their provided data are eligible to be used in `export` rules. + +File layout in actions +---------------------- + +As `just` does full staging for actions, no special considerations are +needed when combining targets of different repositories. Each target +brings its staging of artifacts as usual. In particular, no repository +names (neither local nor global ones) will ever be visible in any +action. So for the consuming target it makes no difference if its +dependency comes from the same or a different repository. diff --git a/doc/concepts/multi-repo.org b/doc/concepts/multi-repo.org deleted file mode 100644 index f1ad736f..00000000 --- a/doc/concepts/multi-repo.org +++ /dev/null @@ -1,167 +0,0 @@ -* Multi-repository build - -** Repository configuration - -*** Open repository names - -A repository can have external dependencies. This is realized by -having unbound ("open") repository names being used as references. -The actual definition of those external repositories is not part -of the repository; we think of them as inputs, i.e., we think of -this repository as a function of the referenced external targets. - -*** Binding in a separate repository configuration - -The actual binding of the free repository names is specified in a -separate repository-configuration file, which is specified on the -command line (via the ~-C~ option); this command-line argument -is optional and the default is that the repository worked on has -no external dependencies. Typically (but not necessarily), this -repository-configuration file is located outside the referenced -repositories and versioned separately or generated from such a -file via ~bin/just-mr.py~. It serves as meta-data for a group of -repositories belonging together. - -This file contains one JSON object. For the key ~"repositories"~ the -value is an object; its keys are the global names of the specified -repositories. For each repository, there is an object describing it. -The key ~"workspace_root"~ describes where to find the repository and -should be present for all (direct or indirect) external dependencies -of the repository worked upon. Additional roots file names (for -target, rule, and expression) can be specified. For keys not given, -the same rules for default values apply as for the corresponding -command-line arguments. Additionally, for each repository, the -key "bindings" specifies the map of the open repository names to -the global names that provide these dependencies. Repositories may -depend on each other (or even themselves), but the resulting global -target graph has to be cycle free. - -Whenever a location has to be specified, the value has to be a -list, with the first entry being specifying the naming scheme; the -semantics of the remaining entries depends on the scheme (see "Root -Naming Schemes" below). - -Additionally, the key ~"main"~ (with default ~""~) specifies -the main repository. The target to be built (as specified on the -command line) is taken from this repository. Also, the command-line -arguments ~-w~, ~--target_root~, etc, apply to this repository. If -no option ~-w~ is given and ~"workspace_root"~ is not specified in -the repository-configuration file either, the root is determined -from the working directory as usual. - -The value of ~main~ can be overwritten on the command line (with -the ~--main~ option) In this way, a consistent configuration -of interdependent repositories can be versioned and referred to -regardless of the repository worked on. - -**** Root naming scheme - -***** ~"file"~ - -The ~"file"~ scheme tells that the repository (or respective root) -can be found in a directory in the local file system; the only -argument is the absolute path to that directory. - - -***** ~"git tree"~ - -The ~"git tree"~ scheme tells that the root is defined to be a tree -given by a git tree identifier. It takes two arguments -- the tree identifier, as hex-encoded string, and -- the absolute path to some repository containing that tree - -**** Example - -Consider, for example, the following repository-configuration file. -In the following, we assume it is located at ~/etc/just/repos.json~. - -#+BEGIN_SRC -{ "main": "env" -, "repositories": - { "foobar": - { "workspace_root": ["file", "/opt/foobar/repo"] - , "rule_root": ["file", "/etc/just/rules"] - , "bindings": {"base": "barimpl"} - } - , "barimpl": - { "workspace_root": ["file", "/opt/barimpl"] - , "target_file_name": "TARGETS.bar" - } - , "env": {"bindings": {"foo": "foobar", "bar": "barimpl"}} - } -} -#+END_SRC - -It specifies 3 repositories, with global names ~foobar~, ~barimpl~, -and ~env~. Within ~foobar~, the repository name ~base~ refers to -~barimpl~, the repository that can be found at ~/opt/barimpl~. - -The repository ~env~ is the main repository and there is no workspace -root defined for it, so it only provides bindings for external -repositories ~foo~ and ~bar~, but the actual repository is taken -from the working directory (unless ~-w~ is specified). In this way, -it provides an environment for developing applications based on -~foo~ and ~bar~. - -For example, the invocation ~just build -C /etc/just/repos.conf -baz~ tells our tool to build the target ~baz~ from the module the -working directory is located in. ~foo~ will refer to the repository -found at ~/opt/foobar/repo~ (using rules from ~/etc/just/rules~, -taking ~base~ refer to the repository at ~/opt/barimpl~) and -~bar~ will refer to the repository at ~/opts/barimpl~. - -** Naming of targets - -*** Reference in target files - -In addition to the normal target references (string for a target in -the name module, module-target pair for a target in same repository, -~["./", relpath, target]~ relative addressing, ~["FILE", null, -name]~ explicit file reference in the same module), references of the -form ~["@", repo, module, target]~ can be specified, where ~repo~ -is string referring to an open name. That open repository name is -resolved to the global name by the ~"bindings"~ parameter of the -repository the target reference is made in. Within the repository -the resolved name refers to, the target ~[module, target]~ is taken. - -*** Expression language: names as abstract values - -Targets are a global concept as they distinguish targets from different -repositories. Their names, however, depend on the repository they -occur in (as the local names might differ in various repositories). -Moreover, some targets cannot be named in certain repositories as -not every repository has a local name in every other repository. - -To handle this naming problem, we note the following. During the -evaluation of a target names occur at two places: as the result of -evaluating the parameters (for target fields) and in the evaluation -of the defining expression when requesting properties of a target -dependent upon (via ~DEP_ARTIFACTS~ and related functions). In the -later case, however, the only legitimate way to obtain a target -name is by the ~FIELD~ function. To enforce this behavior, and -to avoid problems with serializing target names, our expression -language considers target names as opaque values. More precisely, -- in a target description, the target fields are evaluated and the - result of the evaluation is parsed, in the context of the module - the ~TARGET~ file belongs to, as a target name, and -- during evaluation of the defining expression of a the target's - rule, when accessing ~FIELD~ the values of target fields will - be reported as abstract name values and when querying values of - dependencies (via ~DEP_ARTIFACTS~ etc) the correct abstract target - name has to be provided. - -While the defining expression has access to target names (via -target fields), it is not useful to provide them in provided data; -a consuming data cannot use names unless it has those fields as -dependency anyway. Our tool will not enforce this policy; however, -only targets not having names in their provided data are eligible -to be used in ~export~ rules. - -** File layout in actions - -As ~just~ does full staging for actions, no special considerations -are needed when combining targets of different repositories. Each -target brings its staging of artifacts as usual. In particular, no -repository names (neither local nor global ones) will ever be visible -in any action. So for the consuming target it makes no difference -if its dependency comes from the same or a different repository. diff --git a/doc/concepts/overview.md b/doc/concepts/overview.md new file mode 100644 index 00000000..a9bcc847 --- /dev/null +++ b/doc/concepts/overview.md @@ -0,0 +1,210 @@ +Tool Overview +============= + +Structuring +----------- + +### Structuring the Build: Targets, Rules, and Actions + +The primary units this build system deals with are targets: the user +requests the system to build (or install) a target, targets depend on +other targets, etc. Targets typically reflect the units a software +developer thinks in: libraries, binaries, etc. The definition of a +target only describes the information directly belonging to the target, +e.g., its source, private and public header files, and its direct +dependencies. Any other information needed to build a target (like the +public header files of an indirect dependency) are inferred by the build +tool. In this way, the build description can be kept maintainable + +A built target consists of files logically belonging together (like the +actual library file and its public headers) as well as information on +how to use the target (linking arguments, transitive header files, etc). +For a consumer of a target, the definition of this collection of files +as well as the additionally provided information is what defines the +target as a dependency, respectively of where the target is coming from +(i.e., targets coinciding here are indistinguishable for other targets). + +Of course, to actually build a single target from its dependencies, many +invocations of the compiler or other tools are necessary (so called +"actions"); the build tool translates these high level description +into the individual actions necessary and only re-executes those where +inputs have changed. + +This translation of high-level concepts into individual actions is not +hard coded into the tool. It is provided by the user as "rules" and +forms additional input to the build. To avoid duplicate work, rules are +typically maintained centrally for a project or an organization. + +### Structuring the Code: Modules and Repositories + +The code base is usually split into many directories, each containing +source files belonging together. To allow the definition of targets +where their code is, the targets are structured in a similar way. For +each directory, there can be a targets files. Directories for which such +a targets file exists are called "modules". Each file belongs to the +module that is closest when searching upwards in the directory tree. The +targets file of a module defines the targets formed from the source +files belonging to this module. + +Larger projects are often split into "repositories". For this build +tool, a repository is a logical unit. Often those coincide with the +repositories in the sense of version control. This, however, does not +have to be the case. Also, from one directory in the file system many +repositories can be formed that might differ in the rules used, targets +defined, or binding of their dependencies. + +Staging +------- + +A peculiarity of this build system is the complete separation between +physical and logical paths. Targets have their own view of the world, +i.e., they can place their artifacts at any logical path they like, and +this is how they look to other targets. It is up to the consuming +targets what they do with artifacts of the targets they depend on; in +particular, they are not obliged to leave them at the logical location +their dependency put them. + +When such a collection of artifacts at logical locations (often referred +to as the "stage") is realized on the file system (when installing a +target, or as inputs to actions), the paths are interpreted as paths +relative to the respective root (installation or action directory). + +This separation is what allows flexible combination of targets from +various sources without leaking repository names or different file +arrangement if a target is in the "main" repository. + +Repository data +--------------- + +A repository uses a (logical) directory for several purposes: to obtain +source files, to read definitions of targets, to read rules, and to read +expressions that can be used by rules. While all those directories can +(and often are) be the same, this does not have to be the case. For each +of those purposes, a different logical directory (also called "root") +can be used. In this way, one can, e.g., add target definitions to a +source tree originally written for a different build tool without +modifying the original source tree. + +Those roots are usually defined in a repository configuration. For the +"main" repository, i.e., the repository from which the target to be +built is requested, the roots can also be overwritten at the command +line. Roots can be defined as paths in the file system, but also as +`git` tree identifiers (together with the location of some repository +containing that tree). The latter definition is preferable for rules and +dependencies, as it allows high-level caching of targets. It also +motivates the need of adding target definitions without changing the +root itself. + +The same flexibility as for the roots is also present for the names of +the files defining targets, rules, and expressions. While the default +names `TARGETS`, `RULES`, and `EXPRESSIONS` are often used, other file +names can be specified for those as well, either in the repository +configuration or (for the main repository) on the command line. + +The final piece of data needed to describe a repository is the binding +of the open repository names that are used to refer to other +repositories. More details can be found in the documentation on +multi-repository builds. + +Targets +------- + +### Target naming + +In description files, targets, rules, and expressions are referred to by +name. As the context always fixes if a name for a target, rule, or +expression is expected, they use the same naming scheme. + + - A single string refers to the target with this name in the same + module. + - A pair `[module, name]` refers to the target `name` in the module + `module` of the same repository. There are no module names with a + distinguished meaning. The naming scheme is unambiguous, as all + other names given by lists have length at least 3. + - A list `["./", relative-module-path, name]` refers to a target with + the given name in the module that has the specified path relative to + the current module (in the current repository). + - A list `["@", repository, module, name]` refers to the target with + the specified name in the specified module of the specified + repository. + +Additionally, there are special targets that can also be referred to in +target files. + + - An explicit reference of a source-file target in the same module, + specified as `["FILE", null, name]`. The explicit `null` at the + second position (where normally the module would be) is necessary to + ensure the name has length more than 2 to distinguish it from a + reference to the module `"FILE"`. + - A reference to an collection, given by a shell pattern, of explicit + source files in the top-level directory of the same module, + specified as `["GLOB", null, pattern]`. The explicit `null` at + second position is required for the same reason as in the explicit + file reference. + - A reference to a tree target in the same module, specified as + `["TREE", null, name]`. The explicit `null` at second position is + required for the same reason as in the explicit file reference. + +### Data of an analyzed target + +Analyzing a target results in 3 pieces of data. + + - The "artifacts" are a staged collection of artifacts. Typically, + these are what is normally considered the main reason to build a + target, e.g., the actual library file in case of a library. + + - The "runfiles" are another staged collection of artifacts. + Typically, these are files that directly belong to the target and + are somehow needed to use the target. For example, in case of a + library that would be the public header files of the library itself. + + - A "provides" map with additional information the target wants to + provide to its consumers. The data contained in that map can also + contain additional artifacts. Typically, this the remaining + information needed to use the target in a build. + + In case of a library, that typically would include any other + libraries this library transitively depends upon (a stage), the + correct linking order (a list of strings), and the public headers of + the transitive dependencies (another stage). + +A target is completely determined by these 3 pieces of data. A consumer +of the target will have no other information available. Hence it is +crucial, that everything (apart from artifacts and runfiles) needed to +build against that target is contained in the provides map. + +When the installation of a target is requested on the command line, +artifacts and runfiles are installed; in case of staging conflicts, +artifacts take precedence. + +### Source targets + +#### Files + +If a target is not found in the targets file, it is implicitly +treated as a source file. Both, explicit and implicit source files +look the same. The artifacts stage has a single entry: the path is +the relative path of the file to the module root and the value the +file artifact located at the specified location. The runfiles are +the same as the artifacts and the provides map is empty. + +#### Collection of files given by a shell pattern + +A collection of files given by a shell pattern has, both as +artifacts and runfiles, the (necessarily disjoint) union of the +artifact maps of the (zero or more) source targets that match the +pattern. Only *files* in the *top-level* directory of the given +modules are considered for matches. The provides map is empty. + +#### Trees + +A tree describes a directory. Internally, however, it is a single +opaque artifact. Consuming targets cannot look into the internal +structure of that tree. Only when realized in the file system (when +installation is requested or as part of the input to an action), the +directory structure is visible again. + +An explicit tree target is similar to an explicit file target, +except that at the specified location there has to be a directory +rather than a file and the tree artifact corresponding to that +directory is taken instead of a file artifact. diff --git a/doc/concepts/overview.org b/doc/concepts/overview.org deleted file mode 100644 index 5dc7ad20..00000000 --- a/doc/concepts/overview.org +++ /dev/null @@ -1,206 +0,0 @@ -* Tool Overview - -** Structuring - -*** Structuring the Build: Targets, Rules, and Actions - -The primary units this build system deals with are targets: the -user requests the system to build (or install) a target, targets -depend on other targets, etc. Targets typically reflect the units a -software developer thinks in: libraries, binaries, etc. The definition -of a target only describes the information directly belonging to -the target, e.g., its source, private and public header files, and -its direct dependencies. Any other information needed to build a -target (like the public header files of an indirect dependency) -are inferred by the build tool. In this way, the build description -can be kept maintainable - -A built target consists of files logically belonging together (like -the actual library file and its public headers) as well as information -on how to use the target (linking arguments, transitive header files, -etc). For a consumer of a target, the definition of this collection -of files as well as the additionally provided information is what -defines the target as a dependency, respectively of where the target -is coming from (i.e., targets coinciding here are indistinguishable -for other targets). - -Of course, to actually build a single target from its dependencies, -many invocations of the compiler or other tools are necessary (so -called "actions"); the build tool translates these high level -description into the individual actions necessary and only re-executes -those where inputs have changed. - -This translation of high-level concepts into individual actions -is not hard coded into the tool. It is provided by the user as -"rules" and forms additional input to the build. To avoid duplicate -work, rules are typically maintained centrally for a project or an -organization. - -*** Structuring the Code: Modules and Repositories - -The code base is usually split into many directories, each containing -source files belonging together. To allow the definition of targets -where their code is, the targets are structured in a similar way. -For each directory, there can be a targets files. Directories for -which such a targets file exists are called "modules". Each file -belongs to the module that is closest when searching upwards in the -directory tree. The targets file of a module defines the targets -formed from the source files belonging to this module. - -Larger projects are often split into "repositories". For this build -tool, a repository is a logical unit. Often those coincide with -the repositories in the sense of version control. This, however, -does not have to be the case. Also, from one directory in the file -system many repositories can be formed that might differ in the -rules used, targets defined, or binding of their dependencies. - -** Staging - -A peculiarity of this build system is the complete separation -between physical and logical paths. Targets have their own view of -the world, i.e., they can place their artifacts at any logical path -they like, and this is how they look to other targets. It is up to -the consuming targets what they do with artifacts of the targets -they depend on; in particular, they are not obliged to leave them -at the logical location their dependency put them. - -When such a collection of artifacts at logical locations (often -referred to as the "stage") is realized on the file system (when -installing a target, or as inputs to actions), the paths are -interpreted as paths relative to the respective root (installation -or action directory). - -This separation is what allows flexible combination of targets from -various sources without leaking repository names or different file -arrangement if a target is in the "main" repository. - -** Repository data - -A repository uses a (logical) directory for several purposes: to -obtain source files, to read definitions of targets, to read rules, -and to read expressions that can be used by rules. While all those -directories can (and often are) be the same, this does not have -to be the case. For each of those purposes, a different logical -directory (also called "root") can be used. In this way, one can, -e.g., add target definitions to a source tree originally written for -a different build tool without modifying the original source tree. - -Those roots are usually defined in a repository configuration. For -the "main" repository, i.e., the repository from which the target -to be built is requested, the roots can also be overwritten at the -command line. Roots can be defined as paths in the file system, -but also as ~git~ tree identifiers (together with the location -of some repository containing that tree). The latter definition -is preferable for rules and dependencies, as it allows high-level -caching of targets. It also motivates the need of adding target -definitions without changing the root itself. - -The same flexibility as for the roots is also present for the names -of the files defining targets, rules, and expressions. While the -default names ~TARGETS~, ~RULES~, and ~EXPRESSIONS~ are often used, -other file names can be specified for those as well, either in -the repository configuration or (for the main repository) on the -command line. - -The final piece of data needed to describe a repository is the -binding of the open repository names that are used to refer to -other repositories. More details can be found in the documentation -on multi-repository builds. - -** Targets - -*** Target naming - -In description files, targets, rules, and expressions are referred -to by name. As the context always fixes if a name for a target, -rule, or expression is expected, they use the same naming scheme. -- A single string refers to the target with this name in the - same module. -- A pair ~[module, name]~ refers to the target ~name~ in the module - ~module~ of the same repository. There are no module names with - a distinguished meaning. The naming scheme is unambiguous, as - all other names given by lists have length at least 3. -- A list ~["./", relative-module-path, name]~ refers to a target - with the given name in the module that has the specified path - relative to the current module (in the current repository). -- A list ~["@", repository, module, name]~ refers to the target - with the specified name in the specified module of the specified - repository. - -Additionally, there are special targets that can also be referred -to in target files. -- An explicit reference of a source-file target in the same module, - specified as ~["FILE", null, name]~. The explicit ~null~ at the - second position (where normally the module would be) is necessary - to ensure the name has length more than 2 to distinguish it from - a reference to the module ~"FILE"~. -- A reference to an collection, given by a shell pattern, of explicit - source files in the top-level directory of the same module, - specified as ~["GLOB", null, pattern]~. The explicit ~null~ at - second position is required for the same reason as in the explicit - file reference. -- A reference to a tree target in the same module, specified as - ~["TREE", null, name]~. The explicit ~null~ at second position is - required for the same reason as in the explicit file reference. - -*** Data of an analyzed target - -Analyzing a target results in 3 pieces of data. -- The "artifacts" are a staged collection of artifacts. Typically, - these are what is normally considered the main reason to build - a target, e.g., the actual library file in case of a library. -- The "runfiles" are another staged collection of artifacts. Typically, - these are files that directly belong to the target and are somehow - needed to use the target. For example, in case of a library that - would be the public header files of the library itself. -- A "provides" map with additional information the target wants - to provide to its consumers. The data contained in that map can - also contain additional artifacts. Typically, this the remaining - information needed to use the target in a build. - - In case of a library, that typically would include any other - libraries this library transitively depends upon (a stage), - the correct linking order (a list of strings), and the public - headers of the transitive dependencies (another stage). - -A target is completely determined by these 3 pieces of data. A -consumer of the target will have no other information available. -Hence it is crucial, that everything (apart from artifacts and -runfiles) needed to build against that target is contained in the -provides map. - -When the installation of a target is requested on the command line, -artifacts and runfiles are installed; in case of staging conflicts, -artifacts take precedence. - -*** Source targets - -**** Files - -If a target is not found in the targets file, it is implicitly -treated as a source file. Both, explicit and implicit source files -look the same. The artifacts stage has a single entry: the path is -the relative path of the file to the module root and the value the -file artifact located at the specified location. The runfiles are -the same as the artifacts and the provides map is empty. - -**** Collection of files given by a shell pattern - -A collection of files given by a shell pattern has, both as artifacts -and runfiles, the (necessarily disjoint) union of the artifact -maps of the (zero or more) source targets that match the pattern. -Only /files/ in the /top-level/ directory of the given modules are -considered for matches. The provides map is empty. - -**** Trees - -A tree describes a directory. Internally, however, it is a single -opaque artifact. Consuming targets cannot look into the internal -structure of that tree. Only when realized in the file system (when -installation is requested or as part of the input to an action), -the directory structure is visible again. - -An explicit tree target is similar to an explicit file target, except -that at the specified location there has to be a directory rather -than a file and the tree artifact corresponding to that directory -is taken instead of a file artifact. diff --git a/doc/concepts/rules.md b/doc/concepts/rules.md new file mode 100644 index 00000000..2ab4c334 --- /dev/null +++ b/doc/concepts/rules.md @@ -0,0 +1,567 @@ +User-defined Rules +================== + +Targets are defined in terms of high-level concepts like "libraries", +"binaries", etc. In order to translate these high-level definitions +into actionable tasks, the user defines rules, explaining at a single +point how all targets of a given type are built. + +Rules files +----------- + +Rules are defined in rules files (by default named `RULES`). Those +contain a JSON object mapping rule names to their rule definition. For +rules, the same naming scheme as for targets applies. However, built-in +rules (always named by a single string) take precedence in naming; to +explicitly refer to a rule defined in the current module, the module has +to be specified, possibly by a relative path, e.g., +`["./", ".", "install"]`. + +Basic components of a rule +-------------------------- + +A rule is defined through a JSON object with various keys. The only +mandatory key is `"expression"` containing the defining expression of +the rule. + +### `"config_fields"`, `"string_fields"` and `"target_fields"` + +These keys specify the fields that a target defined by that rule can +have. In particular, those have to be disjoint lists of strings. + +For `"config_fields"` and `"string_fields"` the respective field has to +evaluate to a list of strings, whereas `"target_fields"` have to +evaluate to a list of target references. Those references are evaluated +immediately, and in the name context of the target they occur in. + +The difference between `"config_fields"` and `"string_fields"` is that +`"config_fields"` are evaluated before the target fields and hence can +be used by the rule to specify config transitions for the target fields. +`"string_fields"` on the other hand are evaluated *after* +the target fields; hence the rule cannot use them to specify a +configuration transition, however the target definition in those fields +may use the `"outs"` and `"runfiles"` functions to have access to the +names of the artifacts or runfiles of a target specified in one of the +target fields. + +### `"implicit"` + +This key specifies a map of implicit dependencies. The keys of the map +are additional target fields, the values are the fixed list of targets +for those fields. If a short-form name of a target is used (e.g., only a +string instead of a module-target pair), it is interpreted relative to +the repository and module the rule is defined in, not the one the rule +is used in. Other than this, those fields are evaluated the same way as +target fields settable on invocation of the rule. + +### `"config_vars"` + +This is a list of strings specifying which parts of the configuration +the rule uses. The defining expression of the rule is evaluated in an +environment that is the configuration restricted to those variables; if +one of those variables is not specified in the configuration the value +in the restriction is `null`. + +### `"config_transitions"` + +This key specifies a map of (some of) the target fields (whether +declared as `"target_fields"` or as `"implicit"`) to a configuration +expression. Here, a configuration expression is any expression in our +language. It has access to the `"config_vars"` and the `"config_fields"` +and has to evaluate to a list of maps. Each map specifies a transition +to the current configuration by amending it on the domain of that map to +the given value. + +### `"imports"` + +This specifies a map of expressions that can later be used by +`CALL_EXPRESSION`. In this way, duplication of (rule) code can be +avoided. For each key, we have to have a name of an expression; +expressions are named following the same naming scheme as targets and +rules. The names are resolved in the context of the rule. Expressions +themselves are defined in expression files, the default name being +`EXPRESSIONS`. + +Each expression is a JSON object. The only mandatory key is +`"expression"` which has to be an expression in our language. It +optionally can have a key `"vars"` where the value has to be a list of +strings (and the default is the empty list). Additionally, it can have +another optional key `"imports"` following the same scheme as the +`"imports"` key of a rule; in the `"imports"` key of an expression, +names are resolved in the context of that expression. It is a +requirement that the `"imports"` graph be cycle free. + +### `"expression"` + +This specifies the defining expression of the rule. The value has to be +an expression of our expression language (basically, an abstract syntax +tree serialized as JSON). It has access to the following extra functions +and, when evaluated, has to return a result value. + +#### `FIELD` + +The field function takes one argument, `name` which has to evaluate +to the name of a field. For string fields, the given list of strings +is returned; for target fields, the list of abstract names for the +given target is returned. These abstract names are opaque within the +rule language (but meaningful when reported in error messages) and +should only be used to be passed on to other functions that expect +names as inputs. + +#### `DEP_ARTIFACTS` and `DEP_RUNFILES` + +These functions give access to the artifacts, or runfiles, +respectively, of one of the targets depended upon. It takes two +(evaluated) arguments, the mandatory `"dep"` and the optional +`"transition"`. + +The argument `"dep"` has to evaluate to an abstract name (as can be +obtained from the `FIELD` function) of some target specified in one +of the target fields. The `"transition"` argument has to evaluate to +a configuration transition (i.e., a map) and the empty transition is +taken as default. It is an error to request a target-transition pair +for a target that was not requested in the given transition through +one of the target fields. + +#### `DEP_PROVIDES` + +This function gives access to a particular entry of the provides map +of one of the targets depended upon. The arguments `"dep"` and +`"transition"` are as for `DEP_ARTIFACTS`; additionally, there is +the mandatory argument `"provider"` which has to evaluate to a +string. The function returns the value of the provides map of the +target at the given provider. If the key is not in the provides map +(or the value at that key is `null`), the optional argument +`"default"` is evaluated and returned. The default for `"default"` +is the empty list. + +#### `BLOB` + +The `BLOB` function takes a single (evaluated) argument `data` which +is optional and defaults to the empty string. This argument has to +evaluate to a string. The function returns an artifact that is a +non-executable file with the given string as content. + +#### `TREE` + +The `TREE` function takes a single (evaluated) argument `$1` which +has to be a map of artifacts. The result is a single tree artifact +formed from the input map. It is an error if the map cannot be +transformed into a tree (e.g., due to staging conflicts). + +#### `ACTION` + +Actions are a way to define new artifacts from (zero or more) +already defined artifacts by running a command, typically a +compiler, linker, archiver, etc. The action function takes the +following arguments. + + - `"inputs"` A map of artifacts. These artifacts are present when + the command is executed; the keys of the map are the relative + path from the working directory of the command. The command must + not make any assumption about the location of the working + directory in the file system (and instead should refer to files + by path relative to the working directory). Moreover, the + command must not modify the input files in any way. (In-place + operations can be simulated by staging, as is shown in the + example later in this document.) + + It is an additional requirement that no conflicts occur when + interpreting the keys as paths. For example, `"foo.txt"` and + `"./foo.txt"` are different as strings and hence legitimately + can be assigned different values in a map. When interpreted as a + path, however, they name the same path; so, if the `"inputs"` + map contains both those keys, the corresponding values have to + be equal. + + - `"cmd"` The command to execute, given as `argv` vector, i.e., a + non-empty list of strings. The 0'th element of that list will + also be the program to be executed. + + - `"env"` The environment in which the command should be executed, + given as a map of strings to strings. + + - `"outs"` and `"out_dirs"` Two list of strings naming the files + and directories, respectively, the command is expected to + create. It is an error if the command fails to create the + promised output files. These two lists have to be disjoint, but + an entry of `"outs"` may well name a location inside one of the + `"out_dirs"`. + +This function returns a map with keys the strings mentioned in +`"outs"` and `"out_dirs"`. As values this map has artifacts defined +to be the ones created by running the given command (in the given +environment with the given inputs). + +#### `RESULT` + +The `RESULT` function is the only way to obtain a result value. It +takes three (evaluated) arguments, `"artifacts"`, `"runfiles"`, and +`"provides"`, all of which are optional and default to the empty +map. It defines the result of a target that has the given artifacts, +runfiles, and provided data, respectively. In particular, +`"artifacts"` and `"runfiles"` have to be maps to artifacts, and +`"provides"` has to be a map. Moreover, they keys in `"runfiles"` +and `"artifacts"` are treated as paths; it is an error if this +interpretation yields to conflicts. The keys in the artifacts or +runfile maps as seen by other targets are the normalized paths of +the keys given. + +Result values themselves are opaque in our expression language and +cannot be deconstructed in any way. Their only purpose is to be the +result of the evaluation of the defining expression of a target. + +#### `CALL_EXPRESSION` + +This function takes one mandatory argument `"name"` which is +unevaluated; it has to a be a string literal. The expression +imported by that name through the imports field is evaluated in the +current environment restricted to the variables of that expression. +The result of that evaluation is the result of the `CALL_EXPRESSION` +statement. + +During the evaluation of an expression, rule fields can still be +accessed through the functions `FIELD`, `DEP_ARTIFACTS`, etc. In +particular, even an expression with no variables (that, hence, is +always evaluated in the empty environment) can carry out non-trivial +computations and be non-constant. The special functions `BLOB`, +`ACTION`, and `RESULT` are also available. If inside the evaluation +of an expression the function `CALL_EXPRESSION` is used, the name +argument refers to the `"imports"` map of that expression. So the +call graph is deliberately recursion free. + +Evaluation of a target +---------------------- + +A target defined by a user-defined rule is evaluated in the following +way. + + - First, the config fields are evaluated. + + - Then, the target-fields are evaluated. This happens for each field + as follows. + + - The configuration transition for this field is evaluated and the + transitioned configurations determined. + - The argument expression for this field is evaluated. The result + is interpreted as a list of target names. Each of those targets + is analyzed in all the specified configurations. + + - The string fields are evaluated. If the expression for a string + field queries a target (via `outs` or `runfiles`), the value for + that target is returned in the first configuration. The rational + here is that such generator expressions are intended to refer to the + corresponding target in its "main" configuration; they are hardly + used anyway for fields branching their targets over many + configurations. + + - The effective configuration for the target is determined. The target + effectively has used of the configuration the variables used by the + `arguments_config` in the rule invocation, the `config_vars` the + rule specified, and the parts of the configuration used by a target + dependent upon. For a target dependent upon, all parts it used of + its configuration are relevant expect for those fixed by the + configuration transition. + + - The rule expression is evaluated and the result of that evaluation + is the result of the rule. + +Example of developing a rule +---------------------------- + +Let's consider step by step an example of writing a rule. Say we want +to write a rule that programmatically patches some files. + +### Framework: The minimal rule + +Every rule has to have a defining expression evaluating to a `RESULT`. +So the minimally correct rule is the `"null"` rule in the following +example rule file. + + { "null": {"expression": {"type": "RESULT"}}} + +This rule accepts no parameters, and has the empty map as artifacts, +runfiles, and provided data. So it is not very useful. + +### String inputs + +Let's allow the target definition to have some fields. The most simple +fields are `string_fields`; they are given by a list of strings. In the +defining expression we can access them directly via the `FIELD` +function. Strings can be used when defining maps, but we can also create +artifacts from them, using the `BLOB` function. To create a map, we can +use the `singleton_map` function. We define values step by step, using +the `let*` construct. + +``` jsonc +{ "script only": + { "string_fields": ["script"] + , "expression": + { "type": "let*" + , "bindings": + [ [ "script content" + , { "type": "join" + , "separator": "\n" + , "$1": + { "type": "++" + , "$1": + [["H"], {"type": "FIELD", "name": "script"}, ["w", "q", ""]] + } + } + ] + , [ "script" + , { "type": "singleton_map" + , "key": "script.ed" + , "value": + {"type": "BLOB", "data": {"type": "var", "name": "script content"}} + } + ] + ] + , "body": + {"type": "RESULT", "artifacts": {"type": "var", "name": "script"}} + } + } +} +``` + +### Target inputs and derived artifacts + +Now it is time to add the input files. Source files are targets like any +other target (and happen to contain precisely one artifact). So we add a +target field `"srcs"` for the file to be patched. Here we have to keep +in mind that, on the one hand, target fields accept a list of targets +and, on the other hand, the artifacts of a target are a whole map. We +chose to patch all the artifacts of all given `"srcs"` targets. We can +iterate over lists with `foreach` and maps with `foreach_map`. + +Next, we have to keep in mind that targets may place their artifacts at +arbitrary logical locations. For us that means that first we have to +make a decision at which logical locations we want to place the output +artifacts. As one thinks of patching as an in-place operation, we chose +to logically place the outputs where the inputs have been. Of course, we +do not modify the input files in any way; after all, we have to define a +mathematical function computing the output artifacts, not a collection +of side effects. With that choice of logical artifact placement, we have +to decide what to do if two (or more) input targets place their +artifacts at logically the same location. We could simply take a +"latest wins" semantics (keep in mind that target fields give a list +of targets, not a set) as provided by the `map_union` function. We chose +to consider it a user error if targets with conflicting artifacts are +specified. This is provided by the `disjoint_map_union` that also allows +to specify an error message to be provided the user. Here, conflict +means that values for the same map position are defined in a different +way. + +The actual patching is done by an `ACTION`. We have the script already; +to make things easy, we stage the input to a fixed place and also expect +a fixed output location. Then the actual command is a simple shell +script. The only thing we have to keep in mind is that we want useful +output precisely if the action fails. Also note that, while we define +our actions sequentially, they will be executed in parallel, as none of +them depends on the output of another one of them. + +``` jsonc +{ "ed patch": + { "string_fields": ["script"] + , "target_fields": ["srcs"] + , "expression": + { "type": "let*" + , "bindings": + [ [ "script content" + , { "type": "join" + , "separator": "\n" + , "$1": + { "type": "++" + , "$1": + [["H"], {"type": "FIELD", "name": "script"}, ["w", "q", ""]] + } + } + ] + , [ "script" + , { "type": "singleton_map" + , "key": "script.ed" + , "value": + {"type": "BLOB", "data": {"type": "var", "name": "script content"}} + } + ] + , [ "patched files per target" + , { "type": "foreach" + , "var": "src" + , "range": {"type": "FIELD", "name": "srcs"} + , "body": + { "type": "foreach_map" + , "var_key": "file_name" + , "var_val": "file" + , "range": + {"type": "DEP_ARTIFACTS", "dep": {"type": "var", "name": "src"}} + , "body": + { "type": "let*" + , "bindings": + [ [ "action output" + , { "type": "ACTION" + , "inputs": + { "type": "map_union" + , "$1": + [ {"type": "var", "name": "script"} + , { "type": "singleton_map" + , "key": "in" + , "value": {"type": "var", "name": "file"} + } + ] + } + , "cmd": + [ "/bin/sh" + , "-c" + , "cp in out && chmod 644 out && /bin/ed out < script.ed > log 2>&1 || (cat log && exit 1)" + ] + , "outs": ["out"] + } + ] + ] + , "body": + { "type": "singleton_map" + , "key": {"type": "var", "name": "file_name"} + , "value": + { "type": "lookup" + , "map": {"type": "var", "name": "action output"} + , "key": "out" + } + } + } + } + } + ] + , [ "artifacts" + , { "type": "disjoint_map_union" + , "msg": "srcs artifacts must not overlap" + , "$1": + { "type": "++" + , "$1": {"type": "var", "name": "patched files per target"} + } + } + ] + ] + , "body": + {"type": "RESULT", "artifacts": {"type": "var", "name": "artifacts"}} + } + } +} +``` + +A typical invocation of that rule would be a target file like the +following. + +``` jsonc +{ "input.txt": + { "type": "ed patch" + , "script": ["%g/world/s//user/g", "%g/World/s//USER/g"] + , "srcs": [["FILE", null, "input.txt"]] + } +} +``` + +As the input file has the same name as a target (in the same module), we +use the explicit file reference in the specification of the sources. + +### Implicit dependencies and config transitions + +Say, instead of patching a file, we want to generate source files from +some high-level description using our actively developed code generator. +Then we have to do some additional considerations. + + - First of all, every target defined by this rule not only depends on + the targets the user specifies. Additionally, our code generator is + also an implicit dependency. And as it is under active development, + we certainly do not want it to be taken from the ambient build + environment (as we did in the previous example with `ed` which, + however, is a pretty stable tool). So we use an `implicit` target + for this. + - Next, we notice that our code generator is used during the build. In + particular, we want that tool (written in some compiled language) to + be built for the platform we run our actions on, not the target + platform we build our final binaries for. Therefore, we have to use + a configuration transition. + - As our defining expression also needs the configuration transition + to access the artifacts of that implicit target, we better define it + as a reusable expression. Other rules in our rule collection might + also have the same task; so `["transitions", "for host"]` might be a + good place to define it. In fact, it can look like the expression + with that name in our own code base. + +So, the overall organization of our rule might be as follows. + +``` jsonc +{ "generated code": + { "target_fields": ["srcs"] + , "implicit": {"generator": [["generators", "foogen"]]} + , "config_vars": ["HOST_ARCH"] + , "imports": {"for host": ["transitions", "for host"]} + , "config_transitions": + {"generator": [{"type": "CALL_EXPRESSION", "name": "for host"}]} + , "expression": ... + } +} +``` + +### Providing information to consuming targets + +In the simple case of patching, the resulting file is indeed the only +information the consumer of that target needs; in fact, the main point +was that the resulting target could be a drop-in replacement of a source +file. A typical rule, however, defines something like a library and a +library is much more, than just the actual library file and the public +headers: a library may depend on other libraries; therefore, in order to +use it, we need + + - to have the header files of dependencies available that might be + included by the public header files of that library, + - to have the libraries transitively depended upon available during + linking, and + - to know the order in which to link the dependencies (as they might + have dependencies among each other). + +In order to keep a maintainable build description, all this should be +taken care of by simply depending on that library. We do +*not* want the consumer of a target having to be aware of +such transitive dependencies (e.g., when constructing the link command +line), as it used to be the case in early build tools like `make`. + +It is a deliberate design choice that a target is given only by the +result of its analysis, regardless of where it is coming from. +Therefore, all this information needs to be part of the result of a +target. Such kind of information is precisely, what the mentioned +`"provides"` map is for. As a map, it can contain an arbitrary amount of +information and the interface function `"DEP_PROVIDES"` is in such a way +that adding more providers does not affect targets not aware of them +(there is no function asking for all providers of a target). The keys +and their meaning have to be agreed upon by a target and its consumers. +As the latter, however, typically are a target of the same family +(authored by the same group), this usually is not a problem. + +A typical example of computing a provided value is the `"link-args"` in +the rules used by `just` itself. They are defined by the following +expression. + +``` jsonc +{ "type": "nub_right" +, "$1": + { "type": "++" + , "$1": + [ {"type": "keys", "$1": {"type": "var", "name": "lib"}} + , {"type": "CALL_EXPRESSION", "name": "link-args-deps"} + , {"type": "var", "name": "link external", "default": []} + ] + } +} +``` + +This expression + + - collects the respective provider of its dependencies, + - adds itself in front, and + - deduplicates the resulting list, keeping only the right-most + occurrence of each entry. + +In this way, the invariant is kept, that the `"link-args"` from a +topological ordering of the dependencies (in the order that a each entry +is mentioned before its dependencies). diff --git a/doc/concepts/rules.org b/doc/concepts/rules.org deleted file mode 100644 index d4c61b5e..00000000 --- a/doc/concepts/rules.org +++ /dev/null @@ -1,551 +0,0 @@ -* User-defined Rules - -Targets are defined in terms of high-level concepts like "libraries", -"binaries", etc. In order to translate these high-level definitions -into actionable tasks, the user defines rules, explaining at a -single point how all targets of a given type are built. - -** Rules files - -Rules are defined in rules files (by default named ~RULES~). Those -contain a JSON object mapping rule names to their rule definition. -For rules, the same naming scheme as for targets applies. However, -built-in rules (always named by a single string) take precedence -in naming; to explicitly refer to a rule defined in the current -module, the module has to be specified, possibly by a relative -path, e.g., ~["./", ".", "install"]~. - -** Basic components of a rule - -A rule is defined through a JSON object with various keys. The only -mandatory key is ~"expression"~ containing the defining expression -of the rule. - -*** ~"config_fields"~, ~"string_fields"~ and ~"target_fields"~ - -These keys specify the fields that a target defined by that rule can -have. In particular, those have to be disjoint lists of strings. - -For ~"config_fields"~ and ~"string_fields"~ the respective field -has to evaluate to a list of strings, whereas ~"target_fields"~ -have to evaluate to a list of target references. Those references -are evaluated immediately, and in the name context of the target -they occur in. - -The difference between ~"config_fields"~ and ~"string_fields"~ is -that ~"config_fields"~ are evaluated before the target fields and -hence can be used by the rule to specify config transitions for the -target fields. ~"string_fields"~ on the other hand are evaluated -_after_ the target fields; hence the rule cannot use them to -specify a configuration transition, however the target definition -in those fields may use the ~"outs"~ and ~"runfiles"~ functions to -have access to the names of the artifacts or runfiles of a target -specified in one of the target fields. - -*** ~"implicit"~ - -This key specifies a map of implicit dependencies. The keys of the -map are additional target fields, the values are the fixed list -of targets for those fields. If a short-form name of a target is -used (e.g., only a string instead of a module-target pair), it is -interpreted relative to the repository and module the rule is defined -in, not the one the rule is used in. Other than this, those fields -are evaluated the same way as target fields settable on invocation -of the rule. - -*** ~"config_vars"~ - -This is a list of strings specifying which parts of the configuration -the rule uses. The defining expression of the rule is evaluated in an -environment that is the configuration restricted to those variables; -if one of those variables is not specified in the configuration -the value in the restriction is ~null~. - -*** ~"config_transitions"~ - -This key specifies a map of (some of) the target fields (whether -declared as ~"target_fields"~ or as ~"implicit"~) to a configuration -expression. Here, a configuration expression is any expression -in our language. It has access to the ~"config_vars"~ and the -~"config_fields"~ and has to evaluate to a list of maps. Each map -specifies a transition to the current configuration by amending -it on the domain of that map to the given value. - -*** ~"imports"~ - -This specifies a map of expressions that can later be used by -~CALL_EXPRESSION~. In this way, duplication of (rule) code can be -avoided. For each key, we have to have a name of an expression; -expressions are named following the same naming scheme as targets -and rules. The names are resolved in the context of the rule. -Expressions themselves are defined in expression files, the default -name being ~EXPRESSIONS~. - -Each expression is a JSON object. The only mandatory key is -~"expression"~ which has to be an expression in our language. It -optionally can have a key ~"vars"~ where the value has to be a list -of strings (and the default is the empty list). Additionally, it -can have another optional key ~"imports"~ following the same scheme -as the ~"imports"~ key of a rule; in the ~"imports"~ key of an -expression, names are resolved in the context of that expression. -It is a requirement that the ~"imports"~ graph be cycle free. - -*** ~"expression"~ - -This specifies the defining expression of the rule. The value has to -be an expression of our expression language (basically, an abstract -syntax tree serialized as JSON). It has access to the following -extra functions and, when evaluated, has to return a result value. - -**** ~FIELD~ - -The field function takes one argument, ~name~ which has to evaluate -to the name of a field. For string fields, the given list of strings -is returned; for target fields, the list of abstract names for the -given target is returned. These abstract names are opaque within -the rule language (but meaningful when reported in error messages) -and should only be used to be passed on to other functions that -expect names as inputs. - -**** ~DEP_ARTIFACTS~ and ~DEP_RUNFILES~ - -These functions give access to the artifacts, or runfiles, respectively, -of one of the targets depended upon. It takes two (evaluated) -arguments, the mandatory ~"dep"~ and the optional ~"transition"~. - -The argument ~"dep"~ has to evaluate to an abstract name (as can be -obtained from the ~FIELD~ function) of some target specified in one -of the target fields. The ~"transition"~ argument has to evaluate -to a configuration transition (i.e., a map) and the empty transition -is taken as default. It is an error to request a target-transition -pair for a target that was not requested in the given transition -through one of the target fields. - -**** ~DEP_PROVIDES~ - -This function gives access to a particular entry of the provides -map of one of the targets depended upon. The arguments ~"dep"~ -and ~"transition"~ are as for ~DEP_ARTIFACTS~; additionally, there -is the mandatory argument ~"provider"~ which has to evaluate to a -string. The function returns the value of the provides map of the -target at the given provider. If the key is not in the provides -map (or the value at that key is ~null~), the optional argument -~"default"~ is evaluated and returned. The default for ~"default"~ -is the empty list. - -**** ~BLOB~ - -The ~BLOB~ function takes a single (evaluated) argument ~data~ -which is optional and defaults to the empty string. This argument -has to evaluate to a string. The function returns an artifact that -is a non-executable file with the given string as content. - -**** ~TREE~ - -The ~TREE~ function takes a single (evaluated) argument ~$1~ which -has to be a map of artifacts. The result is a single tree artifact -formed from the input map. It is an error if the map cannot be -transformed into a tree (e.g., due to staging conflicts). - -**** ~ACTION~ - -Actions are a way to define new artifacts from (zero or more) already -defined artifacts by running a command, typically a compiler, linker, -archiver, etc. The action function takes the following arguments. -- ~"inputs"~ A map of artifacts. These artifacts are present when - the command is executed; the keys of the map are the relative path - from the working directory of the command. The command must not - make any assumption about the location of the working directory - in the file system (and instead should refer to files by path - relative to the working directory). Moreover, the command must - not modify the input files in any way. (In-place operations can - be simulated by staging, as is shown in the example later in - this document.) - - It is an additional requirement that no conflicts occur when - interpreting the keys as paths. For example, ~"foo.txt"~ and - ~"./foo.txt"~ are different as strings and hence legitimately - can be assigned different values in a map. When interpreted as - a path, however, they name the same path; so, if the ~"inputs"~ - map contains both those keys, the corresponding values have - to be equal. -- ~"cmd"~ The command to execute, given as ~argv~ vector, i.e., - a non-empty list of strings. The 0'th element of that list will - also be the program to be executed. -- ~"env"~ The environment in which the command should be executed, - given as a map of strings to strings. -- ~"outs"~ and ~"out_dirs"~ Two list of strings naming the files - and directories, respectively, the command is expected to create. - It is an error if the command fails to create the promised output - files. These two lists have to be disjoint, but an entry of - ~"outs"~ may well name a location inside one of the ~"out_dirs"~. - -This function returns a map with keys the strings mentioned in -~"outs"~ and ~"out_dirs"~. As values this map has artifacts defined -to be the ones created by running the given command (in the given -environment with the given inputs). - -**** ~RESULT~ - -The ~RESULT~ function is the only way to obtain a result value. -It takes three (evaluated) arguments, ~"artifacts"~, ~"runfiles"~, and -~"provides"~, all of which are optional and default to the empty map. -It defines the result of a target that has the given artifacts, -runfiles, and provided data, respectively. In particular, ~"artifacts"~ -and ~"runfiles"~ have to be maps to artifacts, and ~"provides"~ has -to be a map. Moreover, they keys in ~"runfiles"~ and ~"artifacts"~ -are treated as paths; it is an error if this interpretation yields -to conflicts. The keys in the artifacts or runfile maps as seen by -other targets are the normalized paths of the keys given. - - -Result values themselves are opaque in our expression language -and cannot be deconstructed in any way. Their only purpose is to -be the result of the evaluation of the defining expression of a target. - -**** ~CALL_EXPRESSION~ - -This function takes one mandatory argument ~"name"~ which is -unevaluated; it has to a be a string literal. The expression imported -by that name through the imports field is evaluated in the current -environment restricted to the variables of that expression. The result -of that evaluation is the result of the ~CALL_EXPRESSION~ statement. - -During the evaluation of an expression, rule fields can still be -accessed through the functions ~FIELD~, ~DEP_ARTIFACTS~, etc. In -particular, even an expression with no variables (that, hence, is -always evaluated in the empty environment) can carry out non-trivial -computations and be non-constant. The special functions ~BLOB~, -~ACTION~, and ~RESULT~ are also available. If inside the evaluation -of an expression the function ~CALL_EXPRESSION~ is used, the name -argument refers to the ~"imports"~ map of that expression. So the -call graph is deliberately recursion free. - -** Evaluation of a target - -A target defined by a user-defined rule is evaluated in the -following way. - -- First, the config fields are evaluated. - -- Then, the target-fields are evaluated. This happens for each - field as follows. - - The configuration transition for this field is evaluated and - the transitioned configurations determined. - - The argument expression for this field is evaluated. The result - is interpreted as a list of target names. Each of those targets - is analyzed in all the specified configurations. - -- The string fields are evaluated. If the expression for a string - field queries a target (via ~outs~ or ~runfiles~), the value for - that target is returned in the first configuration. The rational - here is that such generator expressions are intended to refer to - the corresponding target in its "main" configuration; they are - hardly used anyway for fields branching their targets over many - configurations. - -- The effective configuration for the target is determined. The target - effectively has used of the configuration the variables used by - the ~arguments_config~ in the rule invocation, the ~config_vars~ - the rule specified, and the parts of the configuration used by - a target dependent upon. For a target dependent upon, all parts - it used of its configuration are relevant expect for those fixed - by the configuration transition. - -- The rule expression is evaluated and the result of that evaluation - is the result of the rule. - -** Example of developing a rule - -Let's consider step by step an example of writing a rule. Say we want -to write a rule that programmatically patches some files. - -*** Framework: The minimal rule - -Every rule has to have a defining expression evaluating -to a ~RESULT~. So the minimally correct rule is the ~"null"~ -rule in the following example rule file. - -#+BEGIN_SRC -{ "null": {"expression": {"type": "RESULT"}}} -#+END_SRC - -This rule accepts no parameters, and has the empty map as artifacts, -runfiles, and provided data. So it is not very useful. - -*** String inputs - -Let's allow the target definition to have some fields. The most -simple fields are ~string_fields~; they are given by a list of -strings. In the defining expression we can access them directly via -the ~FIELD~ function. Strings can be used when defining maps, but -we can also create artifacts from them, using the ~BLOB~ function. -To create a map, we can use the ~singleton_map~ function. We define -values step by step, using the ~let*~ construct. - -#+BEGIN_SRC -{ "script only": - { "string_fields": ["script"] - , "expression": - { "type": "let*" - , "bindings": - [ [ "script content" - , { "type": "join" - , "separator": "\n" - , "$1": - { "type": "++" - , "$1": - [["H"], {"type": "FIELD", "name": "script"}, ["w", "q", ""]] - } - } - ] - , [ "script" - , { "type": "singleton_map" - , "key": "script.ed" - , "value": - {"type": "BLOB", "data": {"type": "var", "name": "script content"}} - } - ] - ] - , "body": - {"type": "RESULT", "artifacts": {"type": "var", "name": "script"}} - } - } -} -#+END_SRC - -*** Target inputs and derived artifacts - -Now it is time to add the input files. Source files are targets like -any other target (and happen to contain precisely one artifact). So -we add a target field ~"srcs"~ for the file to be patched. Here we -have to keep in mind that, on the one hand, target fields accept a -list of targets and, on the other hand, the artifacts of a target -are a whole map. We chose to patch all the artifacts of all given -~"srcs"~ targets. We can iterate over lists with ~foreach~ and maps -with ~foreach_map~. - -Next, we have to keep in mind that targets may place their artifacts -at arbitrary logical locations. For us that means that first -we have to make a decision at which logical locations we want -to place the output artifacts. As one thinks of patching as an -in-place operation, we chose to logically place the outputs where -the inputs have been. Of course, we do not modify the input files -in any way; after all, we have to define a mathematical function -computing the output artifacts, not a collection of side effects. -With that choice of logical artifact placement, we have to decide -what to do if two (or more) input targets place their artifacts at -logically the same location. We could simply take a "latest wins" -semantics (keep in mind that target fields give a list of targets, -not a set) as provided by the ~map_union~ function. We chose to -consider it a user error if targets with conflicting artifacts are -specified. This is provided by the ~disjoint_map_union~ that also -allows to specify an error message to be provided the user. Here, -conflict means that values for the same map position are defined -in a different way. - -The actual patching is done by an ~ACTION~. We have the script -already; to make things easy, we stage the input to a fixed place -and also expect a fixed output location. Then the actual command -is a simple shell script. The only thing we have to keep in mind -is that we want useful output precisely if the action fails. Also -note that, while we define our actions sequentially, they will -be executed in parallel, as none of them depends on the output of -another one of them. - -#+BEGIN_SRC -{ "ed patch": - { "string_fields": ["script"] - , "target_fields": ["srcs"] - , "expression": - { "type": "let*" - , "bindings": - [ [ "script content" - , { "type": "join" - , "separator": "\n" - , "$1": - { "type": "++" - , "$1": - [["H"], {"type": "FIELD", "name": "script"}, ["w", "q", ""]] - } - } - ] - , [ "script" - , { "type": "singleton_map" - , "key": "script.ed" - , "value": - {"type": "BLOB", "data": {"type": "var", "name": "script content"}} - } - ] - , [ "patched files per target" - , { "type": "foreach" - , "var": "src" - , "range": {"type": "FIELD", "name": "srcs"} - , "body": - { "type": "foreach_map" - , "var_key": "file_name" - , "var_val": "file" - , "range": - {"type": "DEP_ARTIFACTS", "dep": {"type": "var", "name": "src"}} - , "body": - { "type": "let*" - , "bindings": - [ [ "action output" - , { "type": "ACTION" - , "inputs": - { "type": "map_union" - , "$1": - [ {"type": "var", "name": "script"} - , { "type": "singleton_map" - , "key": "in" - , "value": {"type": "var", "name": "file"} - } - ] - } - , "cmd": - [ "/bin/sh" - , "-c" - , "cp in out && chmod 644 out && /bin/ed out < script.ed > log 2>&1 || (cat log && exit 1)" - ] - , "outs": ["out"] - } - ] - ] - , "body": - { "type": "singleton_map" - , "key": {"type": "var", "name": "file_name"} - , "value": - { "type": "lookup" - , "map": {"type": "var", "name": "action output"} - , "key": "out" - } - } - } - } - } - ] - , [ "artifacts" - , { "type": "disjoint_map_union" - , "msg": "srcs artifacts must not overlap" - , "$1": - { "type": "++" - , "$1": {"type": "var", "name": "patched files per target"} - } - } - ] - ] - , "body": - {"type": "RESULT", "artifacts": {"type": "var", "name": "artifacts"}} - } - } -} -#+END_SRC - -A typical invocation of that rule would be a target file like the following. -#+BEGIN_SRC -{ "input.txt": - { "type": "ed patch" - , "script": ["%g/world/s//user/g", "%g/World/s//USER/g"] - , "srcs": [["FILE", null, "input.txt"]] - } -} -#+END_SRC -As the input file has the same name as a target (in the same module), -we use the explicit file reference in the specification of the sources. - -*** Implicit dependencies and config transitions - -Say, instead of patching a file, we want to generate source files -from some high-level description using our actively developed code -generator. Then we have to do some additional considerations. -- First of all, every target defined by this rule not only depends - on the targets the user specifies. Additionally, our code - generator is also an implicit dependency. And as it is under - active development, we certainly do not want it to be taken from - the ambient build environment (as we did in the previous example - with ~ed~ which, however, is a pretty stable tool). So we use an - ~implicit~ target for this. -- Next, we notice that our code generator is used during the - build. In particular, we want that tool (written in some compiled - language) to be built for the platform we run our actions on, not - the target platform we build our final binaries for. Therefore, - we have to use a configuration transition. -- As our defining expression also needs the configuration transition - to access the artifacts of that implicit target, we better define - it as a reusable expression. Other rules in our rule collection - might also have the same task; so ~["transitions", "for host"]~ - might be a good place to define it. In fact, it can look like - the expression with that name in our own code base. - -So, the overall organization of our rule might be as follows. - -#+BEGIN_SRC -{ "generated code": - { "target_fields": ["srcs"] - , "implicit": {"generator": [["generators", "foogen"]]} - , "config_vars": ["HOST_ARCH"] - , "imports": {"for host": ["transitions", "for host"]} - , "config_transitions": - {"generator": [{"type": "CALL_EXPRESSION", "name": "for host"}]} - , "expression": ... - } -} -#+END_SRC - -*** Providing information to consuming targets - -In the simple case of patching, the resulting file is indeed the -only information the consumer of that target needs; in fact, the main -point was that the resulting target could be a drop-in replacement -of a source file. A typical rule, however, defines something like -a library and a library is much more, than just the actual library -file and the public headers: a library may depend on other libraries; -therefore, in order to use it, we need -- to have the header files of dependencies available that might be - included by the public header files of that library, -- to have the libraries transitively depended upon available during - linking, and -- to know the order in which to link the dependencies (as they - might have dependencies among each other). -In order to keep a maintainable build description, all this should -be taken care of by simply depending on that library. We do _not_ -want the consumer of a target having to be aware of such transitive -dependencies (e.g., when constructing the link command line), as -it used to be the case in early build tools like ~make~. - -It is a deliberate design choice that a target is given only by -the result of its analysis, regardless of where it is coming from. -Therefore, all this information needs to be part of the result of -a target. Such kind of information is precisely, what the mentioned -~"provides"~ map is for. As a map, it can contain an arbitrary -amount of information and the interface function ~"DEP_PROVIDES"~ -is in such a way that adding more providers does not affect targets -not aware of them (there is no function asking for all providers -of a target). The keys and their meaning have to be agreed upon -by a target and its consumers. As the latter, however, typically -are a target of the same family (authored by the same group), this -usually is not a problem. - -A typical example of computing a provided value is the ~"link-args"~ -in the rules used by ~just~ itself. They are defined by the following -expression. -#+BEGIN_SRC -{ "type": "nub_right" -, "$1": - { "type": "++" - , "$1": - [ {"type": "keys", "$1": {"type": "var", "name": "lib"}} - , {"type": "CALL_EXPRESSION", "name": "link-args-deps"} - , {"type": "var", "name": "link external", "default": []} - ] - } -} -#+END_SRC -This expression -- collects the respective provider of its dependencies, -- adds itself in front, and -- deduplicates the resulting list, keeping only the right-most - occurrence of each entry. -In this way, the invariant is kept, that the ~"link-args"~ from a -topological ordering of the dependencies (in the order that a each -entry is mentioned before its dependencies). diff --git a/doc/concepts/target-cache.md b/doc/concepts/target-cache.md new file mode 100644 index 00000000..0db627e1 --- /dev/null +++ b/doc/concepts/target-cache.md @@ -0,0 +1,231 @@ +Target-level caching +==================== + +`git` trees as content-fixed roots +---------------------------------- + +### The `"git tree"` root scheme + +The multi-repository configuration supports a scheme `"git tree"`. This +scheme is given by two parameters, + + - the id of the tree (as a string with the hex encoding), and + - an arbitrary `git` repository containing the specified tree object, + as well as all needed tree and blob objects reachable from that + tree. + +For example, a root could be specified as follows. + +``` jsonc +["git tree", "6a1820e78f61aee6b8f3677f150f4559b6ba77a4", "/usr/local/src/justbuild.git"] +``` + +It should be noted that the `git` tree identifier alone already +specifies the content of the full tree. However, `just` needs access to +some repository containing the tree in order to know what the tree looks +like. + +Nevertheless, it is an important observation that the tree identifier +alone already specifies the content of the whole (logical) directory. +The equality of two such directories can be established by comparing the +two identifiers *without* the need to read any file from +disk. Those "fixed-content" descriptions, i.e., descriptions of a +repository root that already fully determines the content are the key to +caching whole targets. + +### `KNOWN` artifacts + +The in-memory representation of known artifacts has an optional +reference to a repository containing that artifact. Artifacts "known" +from local repositories might not be known to the CAS used for the +action execution; this additional reference allows to fill such misses +in the CAS. + +Content-fixed repositories +-------------------------- + +### The parts of a content-fixed repository + +In order to meaningfully cache a target, we need to be able to +efficiently compute the cache key. We restrict this to the case where we +can compute the information about the repository without file-system +access. This requires that all roots (workspace, target root, etc) be +content fixed, as well as the bindings of the free repository names (and +hence also all transitively reachable repositories). The call such +repositories "content-fixed" repositories. + +### Canonical description of a content-fixed repository + +The local data of a repository consists of the following. + + - The roots (for workspace, targets, rules, expressions). As the tree + identifier already defines the content, we leave out the path to the + repository containing the tree. + - The names of the targets, rules, and expression files. + - The names of the outgoing "bindings". + +Additionally, repositories can reach additional repositories via +bindings. Moreover, this repository-level dependency relation is not +necessarily cycle free. In particular, we cannot use the tree unfolding +as canonical representation of that graph up to bisimulation, as we do +with most other data structures. To still get a canonical +representation, we factor out the largest bisimulation, i.e., minimize +the respective automaton (with repositories as states, local data as +locally observable properties, and the binding relation as edges). + +Finally, for each repository individually, the reachable repositories +are renamed `"0"`, `"1"`, `"2"`, etc, following a depth-first traversal +starting from the repository in question where outgoing edges are +traversed in lexicographical order. The entry point is hence +recognisable as repository `"0"`. + +The repository key content-identifier of the canonically formatted +canonical serialisation of the JSON encoding of the obtain +multi-repository configuration (with repository-free git-root +descriptions). The serialisation itself is stored in CAS. + +These identifications and replacement of global names does not change +the semantics, as our name data types are completely opaque to our +expression language. In the `"json_encode"` expression, they're +serialized as `null` and string representation is only generated in user +messages not available to the language itself. Moreover, names cannot be +compared for equality either, so their only observable properties, i.e., +the way `"DEP_ARTIFACTS"`, `"DEP_RUNFILES`, and `"DEP_PROVIDES"` reacts +to them are invariant under repository bisimulation. + +Configuration and the `"export"` rule +------------------------------------- + +Targets not only depend on the content of their repository, but also on +their configurations. Normally, the effective part of a configuration is +only determined after analysing the target. However, for caching, we +need to compute the cache key directly. This property is provided by the +built-in `"export"` rule; only `"export"` targets residing in +content-fixed repositories will be cached. This also serves as +indication, which targets of a repository are intended for consumption +by other repositories. + +An `"export"` rule takes precisely the following arguments. + + - `"target"` specifying a single target, the target to be cached. It + must not be tainted. + - `"flexible_config"` a list of strings; those specify the variables + of the configuration that are considered. All other parts of the + configuration are ignored. So the effective configuration for the + `"export"` target is the configuration restricted to those variables + (filled up with `null` if the variable was not present in the + original configuration). + - `"fixed_config"` a dict with of arbitrary JSON values (taken + unevaluated) with keys disjoint from the `"flexible_config"`. + +An `"export"` target is analyzed as follows. The configuration is +restricted to the variables specified in the `"flexible_config"`; this +will result in the effective configuration for the exported target. It +is a requirement that the effective configuration contain only pure JSON +values. The (necessarily conflict-free) union with the `"fixed_config"` +is computed and the `"target"` is evaluated in this configuration. The +result (artifacts, runfiles, provided information) is the result of that +evaluation. It is a requirement that the provided information does only +contain pure JSON values and artifacts (including tree artifacts); in +particular, they may not contain names. + +Cache key +--------- + +We only consider `"export"` targets in content-fixed repositories for +caching. An export target is then fully described by + + - the repository key of the repository the export target resides in, + - the target name of the export target within that repository, + described as module-name pair, and + - the effective configuration. + +More precisely, the canonical description is the JSON object with those +values for the keys `"repo_key"`, `"target_name"`, and +`"effective_config"`, respectively. The repository key is the blob +identifier of the canonical serialisation (including sorted keys, etc) +of the just described piece of JSON. To allow debugging and cooperation +with other tools, whenever a cache key is computed, it is ensured, that +the serialisation ends up in the applicable CAS. + +It should be noted that the cache key can be computed +*without* analyzing the target referred to. This is +possible, as the configuration is pruned a priori instead of the usual +procedure to analyse and afterwards determine the parts of the +configuration that were relevant. + +Cached value +------------ + +The value to be cached is the result of evaluating the target, that is, +its artifacts, runfiles, and provided data. All artifacts inside those +data structures will be described as known artifacts. + +As serialisation, we will essentially use our usual JSON encoding; while +this can be used as is for artifacts and runfiles where we know that +they have to be a map from strings to artifacts, additional information +will be added for the provided data. The provided data can contain +artifacts, but also legitimately pure JSON values that coincide with our +JSON encoding of artifacts; the same holds true for nodes and result +values. Moreover, the tree unfolding implicit in the JSON serialisation +can be exponentially larger than the value. + +Therefore, in our serialisation, we add an entry for every subexpression +and separately add a list of which subexpressions are artifacts, nodes, +or results. During deserialisation, we use this subexpression structure +to deserialize every subexpression only one. + +Sharding of target cache +------------------------ + +In our target description, the execution environment is not included. +For local execution, it is implicit anyway. As we also want to cache +high-level targets when using remote execution, we shard the target +cache (e.g., by using appropriate subdirectories) by the blob identifier +of the serialisation of the description of the execution backend. Here, +`null` stands for local execution, and for remote execution we use an +object with keys `"remote_execution_address"` and +`"remote_execution_properties"` filled in the obvious way. As usual, we +add the serialisation to the CAS. + +`"export"` targets, strictness and the extensional projection +------------------------------------------------------------- + +As opposed to the target that is exported, the corresponding export +target, if part of a content-fixed repository, will be strict: a build +depending on such a target can only succeed if all artifacts in the +result of target (regardless whether direct artifacts, runfiles, or as +part of the provided data) can be built, even if not all (or even none) +are actually used in the build. + +Upon cache hit, the artifacts of an export target are the known +artifacts corresponding to the artifacts of the exported target. While +extensionally equal, known artifacts are defined differently, so an +export target and the exported target are intensionally different (and +that difference might only be visible on the second build). As +intensional equality is used when testing for absence of conflicts in +staging, a target and its exported version almost always conflict and +hence should not be used together. One way to achieve this is to always +use the export target for any target that is exported. This fits well +together with the recommendation of only depending on export targets of +other repositories. + +If a target forwards artifacts of an exported target (indirect header +files, indirect link dependencies, etc), and is exported again, no +additional conflicts occur; replacing by the corresponding known +artifact is a projection: the known artifact corresponding to a known +artifact is the artifact itself. Moreover, by the strictness property +described earlier, if an export target has a cache hit, then so have all +export targets it depends upon. Keep in mind that a repository can only +be content-fixed if all its dependencies are. + +For this strictness-based approach to work, it is, however, a +requirement that any artifact that is exported (typically indirectly, +e.g., as part of a common dependency) by several targets is only used +through the same export target. For a well-structured repository, this +should not be a natural property anyway. + +The forwarding of artifacts are the reason we chose that in the +non-cached analysis of an export target the artifacts are passed on as +received and are not wrapped in an "add to cache" action. The latter +choice would violate that projection property we rely upon. diff --git a/doc/concepts/target-cache.org b/doc/concepts/target-cache.org deleted file mode 100644 index 591a66af..00000000 --- a/doc/concepts/target-cache.org +++ /dev/null @@ -1,219 +0,0 @@ -* Target-level caching - -** ~git~ trees as content-fixed roots - -*** The ~"git tree"~ root scheme - -The multi-repository configuration supports a scheme ~"git tree"~. -This scheme is given by two parameters, -- the id of the tree (as a string with the hex encoding), and -- an arbitrary ~git~ repository containing the specified tree - object, as well as all needed tree and blob objects reachable - from that tree. -For example, a root could be specified as follows. -#+BEGIN_SRC -["git tree", "6a1820e78f61aee6b8f3677f150f4559b6ba77a4", "/usr/local/src/justbuild.git"] -#+END_SRC - -It should be noted that the ~git~ tree identifier alone already -specifies the content of the full tree. However, ~just~ needs access -to some repository containing the tree in order to know what the -tree looks like. - -Nevertheless, it is an important observation that the tree identifier -alone already specifies the content of the whole (logical) directory. -The equality of two such directories can be established by comparing -the two identifiers _without_ the need to read any file from -disk. Those "fixed-content" descriptions, i.e., descriptions of a -repository root that already fully determines the content are the -key to caching whole targets. - -*** ~KNOWN~ artifacts - -The in-memory representation of known artifacts has an optional -reference to a repository containing that artifact. Artifacts -"known" from local repositories might not be known to the CAS used -for the action execution; this additional reference allows to fill -such misses in the CAS. - -** Content-fixed repositories - -*** The parts of a content-fixed repository - -In order to meaningfully cache a target, we need to be able to -efficiently compute the cache key. We restrict this to the case where -we can compute the information about the repository without file-system -access. This requires that all roots (workspace, target root, etc) -be content fixed, as well as the bindings of the free repository -names (and hence also all transitively reachable repositories). -The call such repositories "content-fixed" repositories. - -*** Canonical description of a content-fixed repository - -The local data of a repository consists of the following. -- The roots (for workspace, targets, rules, expressions). As the - tree identifier already defines the content, we leave out the - path to the repository containing the tree. -- The names of the targets, rules, and expression files. -- The names of the outgoing "bindings". - -Additionally, repositories can reach additional repositories via -bindings. Moreover, this repository-level dependency relation -is not necessarily cycle free. In particular, we cannot use the -tree unfolding as canonical representation of that graph up to -bisimulation, as we do with most other data structures. To still get -a canonical representation, we factor out the largest bisimulation, -i.e., minimize the respective automaton (with repositories as -states, local data as locally observable properties, and the binding -relation as edges). - -Finally, for each repository individually, the reachable repositories -are renamed ~"0"~, ~"1"~, ~"2"~, etc, following a depth-first -traversal starting from the repository in question where outgoing -edges are traversed in lexicographical order. The entry point is -hence recognisable as repository ~"0"~. - -The repository key content-identifier of the canonically formatted -canonical serialisation of the JSON encoding of the obtain -multi-repository configuration (with repository-free git-root -descriptions). The serialisation itself is stored in CAS. - -These identifications and replacement of global names does not change -the semantics, as our name data types are completely opaque to our -expression language. In the ~"json_encode"~ expression, they're -serialized as ~null~ and string representation is only generated in -user messages not available to the language itself. Moreover, names -cannot be compared for equality either, so their only observable -properties, i.e., the way ~"DEP_ARTIFACTS"~, ~"DEP_RUNFILES~, and -~"DEP_PROVIDES"~ reacts to them are invariant under repository -bisimulation. - -** Configuration and the ~"export"~ rule - -Targets not only depend on the content of their repository, but also -on their configurations. Normally, -the effective part of a configuration is only determined after -analysing the target. However, for caching, we need to compute -the cache key directly. This property is provided by the built-in ~"export"~ rule; only ~"export"~ targets -residing in content-fixed repositories will be cached. This also -serves as indication, which targets of a repository are intended -for consumption by other repositories. - -An ~"export"~ rule takes precisely the following arguments. -- ~"target"~ specifying a single target, the target to be cached. - It must not be tainted. -- ~"flexible_config"~ a list of strings; those specify the variables - of the configuration that are considered. All other parts of - the configuration are ignored. So the effective configuration for - the ~"export"~ target is the configuration restricted to those - variables (filled up with ~null~ if the variable was not present - in the original configuration). -- ~"fixed_config"~ a dict with of arbitrary JSON values (taken - unevaluated) with keys disjoint from the ~"flexible_config"~. - -An ~"export"~ target is analyzed as follows. The configuration is -restricted to the variables specified in the ~"flexible_config"~; -this will result in the effective configuration for the exported -target. It is a requirement that the effective configuration contain -only pure JSON values. The (necessarily conflict-free) union with -the ~"fixed_config"~ is computed and the ~"target"~ is evaluated -in this configuration. The result (artifacts, runfiles, provided -information) is the result of that evaluation. It is a requirement -that the provided information does only contain pure JSON values -and artifacts (including tree artifacts); in particular, they may -not contain names. - -** Cache key - -We only consider ~"export"~ targets in content-fixed repositories -for caching. An export target is then fully described by -- the repository key of the repository the export target resides in, -- the target name of the export target within that repository, - described as module-name pair, and -- the effective configuration. -More precisely, the canonical description is the JSON object with -those values for the keys ~"repo_key"~, ~"target_name"~, and ~"effective_config"~, -respectively. The repository key is the blob identifier of the -canonical serialisation (including sorted keys, etc) of the just -described piece of JSON. To allow debugging and cooperation with -other tools, whenever a cache key is computed, it is ensured, -that the serialisation ends up in the applicable CAS. - -It should be noted that the cache key can be computed _without_ -analyzing the target referred to. This is possible, as the -configuration is pruned a priori instead of the usual procedure -to analyse and afterwards determine the parts of the configuration -that were relevant. - -** Cached value - -The value to be cached is the result of evaluating the target, -that is, its artifacts, runfiles, and provided data. All artifacts -inside those data structures will be described as known artifacts. - -As serialisation, we will essentially use our usual JSON encoding; -while this can be used as is for artifacts and runfiles where we -know that they have to be a map from strings to artifacts, additional -information will be added for the provided data. The provided data -can contain artifacts, but also legitimately pure JSON values that -coincide with our JSON encoding of artifacts; the same holds true -for nodes and result values. Moreover, the tree unfolding implicit -in the JSON serialisation can be exponentially larger than the value. - -Therefore, in our serialisation, we add an entry for every subexpression -and separately add a list of which subexpressions are artifacts, -nodes, or results. During deserialisation, we use this subexpression -structure to deserialize every subexpression only one. - -** Sharding of target cache - -In our target description, the execution environment is not included. -For local execution, it is implicit anyway. As we also want to -cache high-level targets when using remote execution, we shard the -target cache (e.g., by using appropriate subdirectories) by the blob -identifier of the serialisation of the description of the execution -backend. Here, ~null~ stands for local execution, and for remote -execution we use an object with keys ~"remote_execution_address"~ -and ~"remote_execution_properties"~ filled in the obvious way. As -usual, we add the serialisation to the CAS. - -** ~"export"~ targets, strictness and the extensional projection - -As opposed to the target that is exported, the corresponding export -target, if part of a content-fixed repository, will be strict: a -build depending on such a target can only succeed if all artifacts -in the result of target (regardless whether direct artifacts, -runfiles, or as part of the provided data) can be built, even if -not all (or even none) are actually used in the build. - -Upon cache hit, the artifacts of an export target are the known -artifacts corresponding to the artifacts of the exported target. -While extensionally equal, known artifacts are defined differently, -so an export target and the exported target are intensionally -different (and that difference might only be visible on the second -build). As intensional equality is used when testing for absence -of conflicts in staging, a target and its exported version almost -always conflict and hence should not be used together. One way to -achieve this is to always use the export target for any target that -is exported. This fits well together with the recommendation of -only depending on export targets of other repositories. - -If a target forwards artifacts of an exported target (indirect header -files, indirect link dependencies, etc), and is exported again, no -additional conflicts occur; replacing by the corresponding known -artifact is a projection: the known artifact corresponding to a -known artifact is the artifact itself. Moreover, by the strictness -property described earlier, if an export target has a cache hit, -then so have all export targets it depends upon. Keep in mind that -a repository can only be content-fixed if all its dependencies are. - -For this strictness-based approach to work, it is, however, a -requirement that any artifact that is exported (typically indirectly, -e.g., as part of a common dependency) by several targets is only -used through the same export target. For a well-structured repository, -this should not be a natural property anyway. - -The forwarding of artifacts are the reason we chose that in the -non-cached analysis of an export target the artifacts are passed on -as received and are not wrapped in an "add to cache" action. The -latter choice would violate that projection property we rely upon. diff --git a/doc/future-designs/computed-roots.md b/doc/future-designs/computed-roots.md new file mode 100644 index 00000000..8bbff401 --- /dev/null +++ b/doc/future-designs/computed-roots.md @@ -0,0 +1,156 @@ +Computed roots +============== + +Status quo +---------- + +As of version `1.0.0`, the `just` build tool requires a the repository +configuration, including all roots, to be specified ahead of time. This +has a couple of consequences. + +### Flexible source views, thanks to staging + +For source files, the flexibility of using them in a layout different +from how they occur in the source tree is gained through staging. If a +different view of sources is needed, instead of a source target, a +defined target can be used that rearranges the sources as desired. In +this way, also programmatic transformations of source files can be +carried out (while the result is still visible at the original +location), as is done, e.g., by the `["patch", "file"]` rule of the +`just` main repository. + +### Restricted flexibility in target-definitions via globbing + +When defining targets, the general principle is that the definition of +target and action graph only depends on the description (given by the +target files, the rules and expressions, and the configuration). There +is, however, a single exception to that rule: a target file may use the +`GLOB` built-in construct and in this way depend on the index of the +respective source directory. This allows, e.g., to define a separate +action for every source file and, in this way, get good incrementality +and parallelism, while still having a concise target description. + +### Modularity in rules through expressions + +Rules might share common tasks. For example, for both `C` binaries and +`C` libraries, the source files have to be compiled to object files. To +avoid duplication of descriptions, expressions can be called (also from +expressions themselves). + +Use cases that require more flexibility +--------------------------------------- + +### Generated target files + +Sometimes projects (or parts thereof that can form a separate logical +repository) have a simple structure. For example, there is a list of +directories and for each one there is a library, named and staged in a +systematic way. Repeating all those systematic target files seems +unnecessary work. Instead, we could store the list of directories to +consider and a small script containing the naming/staging/globbing +logic; this approach would also be more maintainable. A similar approach +could also be attractive for a directory tree with tests where, on top, +all the individual tests should be collected to test suites. + +### Staging according to embedded information + +For importing prebuilt libraries, it is sometimes desirable to stage +them in a way honoring the embedded `soname`. The current approach is to +provide that information out of band in the target file, so that it can +be used during analysis. Still, the information is already present in +the prebuilt binary, causing unnecessary maintenance overhead; instead, +the target file could be a function of that library which can form its +own content-fixed root (e.g., a `git tree` root), so that the computed +value is easily cacheable. + +### Simplified rule definition and alternative syntax + +Rules can share computation through expressions. However, the interface, +deliberately has to be explicit, including the documentation strings +that are used by `just describe`. While this allows easy and efficient +implementation of `just describe`, there is some redundancy involved, as +often fields are only there to be used by a common expression, but this +have to be documented in a redundant way (causing additional maintenance +burden). + +Moreover, using JSON encoding of abstract syntax trees is an +unambiguously readable and easy to automatically process format, but +people argue that it is hard to write by hand. However, it is unlikely +to get agreement on which syntax is best to use. Now, if rule and +expression files could be generated, this argument would not be +necessary. Moreover, rules are typically versioned and infrequently +changed, so the step of generating the official syntax from the +convenient one would typically be in cache. + +Proposal: Support computed roots +-------------------------------- + +We propose computed roots as a clean principle to add the needed (and a +lot more) flexibility for the described use cases, while ensuring that +all computations of roots are properly cacheable at high level. In this +way, we do not compromise efficient builds, as the price of the +additional flexibility, in the typical case, is just a single cache +lookup. Of course, it is up to the user to ensure that this case really +is the typical one, in the same way as it is their responsibility to +describe the targets in a way to have proper incrementality. + +### New root type `"computed"` + +The `just` multi-repository configuration will allow a new type of root +(besides `"file"` and `"git tree"` and variants thereof), called +`"computed"`. A `"computed"` root is given by + + - the (global) name of a repository + - the name of a target (in `["module", "target"]` format), and + - a configuration (as JSON object, taken literally). + +It is a requirement that the specified target is an `"export"` target +and the specified repository content-fixed; `"computed"` roots are +considered content-fixed. However, the dependency structure of computed +roots must be cycle free. In other words, there must exist an ordering +of computed roots (the implicit topological order, not a declared one) +such that for each computed root, the referenced repository as well as +all repositories reachable from that one via the `"bindings"` map only +contain computed roots earlier in that order. + +### Strict evaluation of roots as artifact tree + +The building of required computed roots happens in topological order; +the build of the defining target of a root is, in principle (subject to +a user-defined restriction of parallelism) started as soon as all roots +in the repositories reachable via bindings are available. The root is +then considered the artifact tree of the defining target. + +In particular, the evaluation is strict: all roots of reachable +repositories have to be successfully computed before the evaluation is +started, even if it later turns out that one of these roots is never +accessed in the computation of the defining target. The reason for this +strictness requirement is to ensure that the cache key for target-level +caching can be computed ahead of time (and we expect the entry to be in +target-level cache most of the time anyway). + +### Intensional equality of computed roots + +During a build, each computed root is evaluated only once, even if +required in several places. Two computed roots are considered equal, if +they are defined in the same way, i.e., repository name, target, and +configuration agree. The repository or layer using the computed root is +not part of the root definition. + +### Computed roots available to the user + +As computed roots are defined by export targets, the respective +artifacts are stored in the local CAS anyway. Additionally, the tree +that forms the root will be added to CAS as well. Moreover, an option +will be added to specify a log file that contains, in machine-readable +way, all the tree identifiers of all computed roots used in this build, +together with their definition. + +### `just-mr` to support computed roots + +To allow simply setting up a `just` configuration using computed roots, +`just-mr` will allow a repository type `"computed"` with the same +parameters as a computed root. These repositories can be used as roots, +like any other `just-mr` repository type. When generating the `just` +multi-repository configuration, the definition of a `"computed"` +repository is just forwarded as computed root. diff --git a/doc/future-designs/computed-roots.org b/doc/future-designs/computed-roots.org deleted file mode 100644 index a83eee67..00000000 --- a/doc/future-designs/computed-roots.org +++ /dev/null @@ -1,154 +0,0 @@ -* Computed roots - -** Status quo - -As of version ~1.0.0~, the ~just~ build tool requires a the repository -configuration, including all roots, to be specified ahead of time. -This has a couple of consequences. - -*** Flexible source views, thanks to staging - -For source files, the flexibility of using them in a layout different -from how they occur in the source tree is gained through staging. -If a different view of sources is needed, instead of a source -target, a defined target can be used that rearranges the sources as -desired. In this way, also programmatic transformations of source -files can be carried out (while the result is still visible at the -original location), as is done, e.g., by the ~["patch", "file"]~ -rule of the ~just~ main repository. - -*** Restricted flexibility in target-definitions via globbing - -When defining targets, the general principle is that the definition -of target and action graph only depends on the description (given by -the target files, the rules and expressions, and the configuration). -There is, however, a single exception to that rule: a target file -may use the ~GLOB~ built-in construct and in this way depend on -the index of the respective source directory. This allows, e.g., -to define a separate action for every source file and, in this -way, get good incrementality and parallelism, while still having -a concise target description. - -*** Modularity in rules through expressions - -Rules might share common tasks. For example, for both ~C~ binaries -and ~C~ libraries, the source files have to be compiled to object -files. To avoid duplication of descriptions, expressions can be -called (also from expressions themselves). - -** Use cases that require more flexibility - -*** Generated target files - -Sometimes projects (or parts thereof that can form a separate -logical repository) have a simple structure. For example, there is -a list of directories and for each one there is a library, named -and staged in a systematic way. Repeating all those systematic -target files seems unnecessary work. Instead, we could store the -list of directories to consider and a small script containing the -naming/staging/globbing logic; this approach would also be more -maintainable. A similar approach could also be attractive for a -directory tree with tests where, on top, all the individual tests -should be collected to test suites. - -*** Staging according to embedded information - -For importing prebuilt libraries, it is sometimes desirable to -stage them in a way honoring the embedded ~soname~. The current -approach is to provide that information out of band in the target -file, so that it can be used during analysis. Still, the information -is already present in the prebuilt binary, causing unnecessary -maintenance overhead; instead, the target file could be a function -of that library which can form its own content-fixed root (e.g., a -~git tree~ root), so that the computed value is easily cacheable. - -*** Simplified rule definition and alternative syntax - -Rules can share computation through expressions. However, the -interface, deliberately has to be explicit, including the documentation -strings that are used by ~just describe~. While this allows easy -and efficient implementation of ~just describe~, there is some -redundancy involved, as often fields are only there to be used by -a common expression, but this have to be documented in a redundant -way (causing additional maintenance burden). - -Moreover, using JSON encoding of abstract syntax trees is an -unambiguously readable and easy to automatically process format, -but people argue that it is hard to write by hand. However, it is -unlikely to get agreement on which syntax is best to use. Now, if -rule and expression files could be generated, this argument would -not be necessary. Moreover, rules are typically versioned and -infrequently changed, so the step of generating the official syntax -from the convenient one would typically be in cache. - -** Proposal: Support computed roots - -We propose computed roots as a clean principle to add the needed (and -a lot more) flexibility for the described use cases, while ensuring -that all computations of roots are properly cacheable at high level. -In this way, we do not compromise efficient builds, as the price of -the additional flexibility, in the typical case, is just a single -cache lookup. Of course, it is up to the user to ensure that this -case really is the typical one, in the same way as it is their -responsibility to describe the targets in a way to have proper -incrementality. - -*** New root type ~"computed"~ - -The ~just~ multi-repository configuration will allow a new type -of root (besides ~"file"~ and ~"git tree"~ and variants thereof), -called ~"computed"~. A ~"computed"~ root is given by -- the (global) name of a repository -- the name of a target (in ~["module", "target"]~ format), and -- a configuration (as JSON object, taken literally). -It is a requirement that the specified target is an ~"export"~ -target and the specified repository content-fixed; ~"computed"~ roots -are considered content-fixed. However, the dependency structure of -computed roots must be cycle free. In other words, there must exist -an ordering of computed roots (the implicit topological order, not -a declared one) such that for each computed root, the referenced -repository as well as all repositories reachable from that one -via the ~"bindings"~ map only contain computed roots earlier in -that order. - -*** Strict evaluation of roots as artifact tree - -The building of required computed roots happens in topological order; -the build of the defining target of a root is, in principle (subject -to a user-defined restriction of parallelism) started as soon as all -roots in the repositories reachable via bindings are available. The -root is then considered the artifact tree of the defining target. - -In particular, the evaluation is strict: all roots of reachable -repositories have to be successfully computed before the evaluation -is started, even if it later turns out that one of these roots is -never accessed in the computation of the defining target. The reason -for this strictness requirement is to ensure that the cache key for -target-level caching can be computed ahead of time (and we expect -the entry to be in target-level cache most of the time anyway). - -*** Intensional equality of computed roots - -During a build, each computed root is evaluated only once, even -if required in several places. Two computed roots are considered -equal, if they are defined in the same way, i.e., repository name, -target, and configuration agree. The repository or layer using the -computed root is not part of the root definition. - -*** Computed roots available to the user - -As computed roots are defined by export targets, the respective -artifacts are stored in the local CAS anyway. Additionally, the -tree that forms the root will be added to CAS as well. Moreover, -an option will be added to specify a log file that contains, in -machine-readable way, all the tree identifiers of all computed -roots used in this build, together with their definition. - -*** ~just-mr~ to support computed roots - -To allow simply setting up a ~just~ configuration using computed -roots, ~just-mr~ will allow a repository type ~"computed"~ with the -same parameters as a computed root. These repositories can be used -as roots, like any other ~just-mr~ repository type. When generating -the ~just~ multi-repository configuration, the definition of a -~"computed"~ repository is just forwarded as computed root. diff --git a/doc/future-designs/execution-properties.md b/doc/future-designs/execution-properties.md new file mode 100644 index 00000000..d6fc53e8 --- /dev/null +++ b/doc/future-designs/execution-properties.md @@ -0,0 +1,125 @@ +Action-controlled execution properties +====================================== + +Motivation +---------- + +### Varying execution platforms + +It is a common situation that software is developed for one platform, +but it is desirable to build on a different one. For example, the other +platform could be faster (common theme when developing for embedded +devices), cheaper, or simply available in larger quantities. The +standard solution for these kind of situations is cross compiling: the +binary is completely built on one platform, while being intended to run +on a different one. This can be achieved by constructing the compiler +invocations accordingly and is already built into our rules (at least +for `C` and `C++`). + +The situation changes, however, once testing (especially end-to-end +testing) comes into play. Here, we actually have to run the built +binary---and do so on the target architecture. Nevertheless, we still +want to offload as much as possible of the work to the other platform +and perform only the actual test execution on the target platform. This +requires a single build executing actions on two (or more) platforms. + +### Varying execution times + +#### Calls to foreign build systems + +Often, third-party dependencies that natively build with a different +build system and don't change to often (yet often enough to not +have them part of the build image) are simply put in a single +action, so that they get built only once, and then stay in cache for +everyone. This is precisely, what our `rules-cc` rules like +`["CC/foreign/make", +"library"]` and `["CC/foreign/cmake", "library"]` do. + +For those compound actions, we of course expect them to run longer +than normal actions that only consist of a single compiler or linker +invocation. Giving an absolute amount of time needed for such an +action is not reasonable, as that very much depends on the +underlying hardware. However, it is reasonable to give a number +"typical" actions this compound action corresponds to. + +#### Long-running end-to-end tests + +A similar situation where a significantly longer action is needed in +a build otherwise consisting of short actions are end-to-end tests. +Test using the final binary might have a complex set up, potentially +involving several instances running to test communication, and +require a lengthy sequence of interactions to get into the situation +that is to be tested, or to verify the absence of degrading of the +service under high load or extended usage. + +Status Quo +---------- + +Action can at the moment specify + + - the actual action, i.e., inputs, outputs, and the command vector, + - the environment variables, + - a property that the action can fail (e.g., for test actions), and + - a property that the action is not to be taken from cache (e.g., + testing for flakiness). + +No other properties can be set by the action itself. In particular, +remote-execution properties and timeout are equal for all actions of a +build. + +Proposed changes +---------------- + +### Extension of the `"ACTION"` function + +We propose to extend the `"ACTION"` function available in the rule +definition by the following attributes. All of the new attributes are +optional, and the default is taken to reflect the status quo. Hence, the +proposed changes are backwards compatible. + +#### `"execution properties"` + +This value has to evaluate to a map of strings; if not given, the +empty map is taken as default. This map is taken as a union with any +remote-execution properties specified at the invocation of the build +(if keys are defined both, for the entire build and in +`"execution properties"` of a specific action, the latter takes +precedence). + +Local execution continues to any execution properties specified. +However, with the auxiliary change to `just` described later, such +execution properties can also influence a build that is local by +default. + +#### `"timeout scaling"` + +If given, the value has to be a number greater or equal than `1.0`, +with `1.0` taken as default. The action timeout specified for this +build (the default value, or whatever is specified on the command +line) is multiplied by the given factor and taken as timeout for +this action. This applies for both, local and remote builds. + +### `just` to support dispatching based on remote-execution properties + +In simple setups, like using `just execute`, the remote execution is not +capable of dispatching to different workers based on remote-execution +properties. To nevertheless have the benefits of using different +execution environments, `just` will allow an optional configuration file +to be passed on the command line via a new option +`--endpoint-configuration`. This configuration file will contain a list +of pairs of remote-execution properties and remote-execution endpoints. +The first matching entry (i.e., the first entry where the +remote-execution property map coincides with the given map when +restricted to its domain) determines the remote-execution endpoint to be +used; if no entry matches, the default remote-execution endpoint is +used. In any case, the remote-execution properties are forwarded to the +chosen remote-execution endpoint without modification. + +When connecting a non-standard remote-execution endpoint, `just` will +ensure that the applicable CAS of that endpoint will have all the needed +artifacts for that action. It will also transfer all result artifacts +back to the CAS of the default remote-execution endpoint. + +`just serve` (once implemented) will also support this new option. As +with the default execution endpoint, there is the understanding that the +client uses the same configuration as the `just serve` endpoint. diff --git a/doc/future-designs/execution-properties.org b/doc/future-designs/execution-properties.org deleted file mode 100644 index 6e9cf9e3..00000000 --- a/doc/future-designs/execution-properties.org +++ /dev/null @@ -1,119 +0,0 @@ -* Action-controlled execution properties - -** Motivation - -*** Varying execution platforms - -It is a common situation that software is developed for one platform, -but it is desirable to build on a different one. For example, -the other platform could be faster (common theme when developing -for embedded devices), cheaper, or simply available in larger -quantities. The standard solution for these kind of situations is -cross compiling: the binary is completely built on one platform, -while being intended to run on a different one. This can be achieved -by constructing the compiler invocations accordingly and is already -built into our rules (at least for ~C~ and ~C++~). - -The situation changes, however, once testing (especially end-to-end -testing) comes into play. Here, we actually have to run the built -binary---and do so on the target architecture. Nevertheless, we -still want to offload as much as possible of the work to the other -platform and perform only the actual test execution on the target -platform. This requires a single build executing actions on two (or -more) platforms. - -*** Varying execution times - -**** Calls to foreign build systems - -Often, third-party dependencies that natively build with a different -build system and don't change to often (yet often enough to not have -them part of the build image) are simply put in a single action, so -that they get built only once, and then stay in cache for everyone. -This is precisely, what our ~rules-cc~ rules like ~["CC/foreign/make", -"library"]~ and ~["CC/foreign/cmake", "library"]~ do. - -For those compound actions, we of course expect them to run longer -than normal actions that only consist of a single compiler or -linker invocation. Giving an absolute amount of time needed for -such an action is not reasonable, as that very much depends on the -underlying hardware. However, it is reasonable to give a number -"typical" actions this compound action corresponds to. - -**** Long-running end-to-end tests - -A similar situation where a significantly longer action is needed in -a build otherwise consisting of short actions are end-to-end tests. -Test using the final binary might have a complex set up, potentially -involving several instances running to test communication, and -require a lengthy sequence of interactions to get into the situation -that is to be tested, or to verify the absence of degrading of the -service under high load or extended usage. - -** Status Quo - -Action can at the moment specify -- the actual action, i.e., inputs, outputs, and the command vector, -- the environment variables, -- a property that the action can fail (e.g., for test actions), and -- a property that the action is not to be taken from cache (e.g., - testing for flakiness). -No other properties can be set by the action itself. In particular, -remote-execution properties and timeout are equal for all actions -of a build. - -** Proposed changes - -*** Extension of the ~"ACTION"~ function - -We propose to extend the ~"ACTION"~ function available in the rule -definition by the following attributes. All of the new attributes -are optional, and the default is taken to reflect the status quo. -Hence, the proposed changes are backwards compatible. - -**** ~"execution properties"~ - -This value has to evaluate to a map of strings; if not given, the -empty map is taken as default. This map is taken as a union with -any remote-execution properties specified at the invocation of -the build (if keys are defined both, for the entire build and in -~"execution properties"~ of a specific action, the latter takes -precedence). - -Local execution continues to any execution properties specified. -However, with the auxiliary change to ~just~ described later, -such execution properties can also influence a build that is local -by default. - -**** ~"timeout scaling"~ - -If given, the value has to be a number greater or equal than ~1.0~, -with ~1.0~ taken as default. The action timeout specified for this -build (the default value, or whatever is specified on the command -line) is multiplied by the given factor and taken as timeout for -this action. This applies for both, local and remote builds. - -*** ~just~ to support dispatching based on remote-execution properties - -In simple setups, like using ~just execute~, the remote execution -is not capable of dispatching to different workers based on -remote-execution properties. To nevertheless have the benefits of -using different execution environments, ~just~ will allow an optional -configuration file to be passed on the command line via a new option -~--endpoint-configuration~. This configuration file will contain a -list of pairs of remote-execution properties and remote-execution -endpoints. The first matching entry (i.e., the first entry where -the remote-execution property map coincides with the given map when -restricted to its domain) determines the remote-execution endpoint to -be used; if no entry matches, the default remote-execution endpoint -is used. In any case, the remote-execution properties are forwarded -to the chosen remote-execution endpoint without modification. - -When connecting a non-standard remote-execution endpoint, ~just~ will -ensure that the applicable CAS of that endpoint will have all the -needed artifacts for that action. It will also transfer all result -artifacts back to the CAS of the default remote-execution endpoint. - -~just serve~ (once implemented) will also support this new option. As -with the default execution endpoint, there is the understanding that -the client uses the same configuration as the ~just serve~ endpoint. diff --git a/doc/future-designs/service-target-cache.md b/doc/future-designs/service-target-cache.md new file mode 100644 index 00000000..941115e9 --- /dev/null +++ b/doc/future-designs/service-target-cache.md @@ -0,0 +1,236 @@ +Target-level caching as a service +================================= + +Motivation +---------- + +Projects can have quite a lot of dependencies that are not part of the +build environment, but are, instead, built from source, e.g., in order +to always build against the latest snapshot. The latter is a typical +workflow in case of first-party dependencies. In the case of +`justbuild`, those first-party dependencies form a separate logical +repository that is typically content fixed (e.g., because that +dependency is versioned in a `git` repository). + +Moreover, code is typically first built (and tested) by the owning +project before being used as a dependency. Therefore, if remote +execution is used, for a first-party dependency, we expect all actions +to be in cache. As dependencies are typically updated less often than +the code being developed is changed, in most builds, the dependencies +are in target-level cache. In other words, in a remote-execution setup, +the whole code of dependencies is fetched just to walk through the +action graph a single time to get the necessary cache hits. + +Proposal: target-level caching as a service +------------------------------------------- + +To avoid these unnecessary fetches, we add a new subcommand `just +serve` that starts a service that provides the dependencies. This +typically happens by looking up a target-level cache entry. If the +entry, however, is not in cache, this also includes building the +respective `export` target using an associated remote-execution end +point. + +### Scope: eligible `export` targets + +In order to typically have requests in cache, `just serve` will refuse +to handle requests that do not refer to `export` targets in +content-fixed repositories; recall that for a repository to be content +fixed, so have to be all repositories reachable from there. + +### Communication through an associated remote-execution service + +Each `just serve` endpoint is always associated with a remote-execution +endpoint. All artifacts exchanged between client and `just serve` +endpoint are exchanged via the CAS that is part in the associated +remote-execution endpoint. This remote-execution endpoint is also used +if `just serve` has to build targets. + +The associated remote-execution endpoint can well be the same process +simultaneously acting as `just execute`. In fact, this is the default if +no remote-execution endpoint is specified. + +### Protocol + +Communication is handled via `grpc` exchanging `proto` buffers +containing the information described in the rest of this section. + +#### Main request and answer format + +A request is given by + + - the map of remote-execution properties for the designated + remote-execution endpoint; together with the knowledge on the + fixed endpoint, the `just serve` instance can compute the + target-level cache shard, and + - the identifier of the target-level cache key; it is the + client's responsibility to ensure that the referred blob (i.e., + the JSON object with appropriate values for the keys + `"repo_key"`, `"target_name"`, and `"effective_config"`) as well + as the indirectly referred repository description (the JSON + object the `"repo_key"` in the cache key refers to) are uploaded + to CAS (of the designated remote-execution endpoint) beforehand. + +The answer to that request is the identifier of the corresponding +target-level cache value (in the same format as for local +target-level caching). The `just serve` instance will ensure that +the actual value, as well as any directly or indirectly referenced +artifacts are available in the respective remote-execution CAS. +Alternatively, the answer can indicate the kind of error (unknown +root, not an export target, build failure, etc). + +#### Auxiliary request: tree of a commit + +As for `git` repositories, it is common to specify a commit in order +to fix a dependency (even though the corresponding tree identifier +would be enough). Moreover, the standard `git` protocol supports +asking for the commit of a given remote branch, but additional +overhead is needed in order to get the tree identifier. + +Therefore, in order to support clients (or, more precisely, +`just-mr` instances setting up the repository description) in +constructing an appropriate request for `just serve` without +unnecessary overhead, `just serve` will support a second kind of +request, where the client request consists of a `git` commit +identifier and the server answers with the tree identifier for that +commit if it is aware of that commit, or indicates that it is not +aware of that commit. + +#### Auxiliary request: describe + +To support `just describe` also in the cases where code is delegated +to the `just serve` endpoint, an additional request for the +`describe` information of a target can be requested; as `just +serve` only handles `export` targets, this target necessarily has to +be an export target. + +The request is given by the identifier of the target-level cache +key, again with the promise that the referred blob is available in +CAS. The answer is the identifier of a blob containing a JSON object +with the needed information, i.e., those parts of the target +description that are used by `just describe`. Alternatively, the +answer may indicate the kind of error (unknown root, not an export +target, etc). + +### Sources: local git repositories and remote trees + +A `just serve` instance takes roots from various sources, + + - the `git` repository contained in the local build root, + - additional `git` repositories, optionally specified in the + invocation, and + - as last resort, asking the CAS in the designated remote-execution + service for the specified `git` tree. + +Allowing a list of repositories to take as sources (rather than a single +one) increases the effort when having to search for a specified tree (in +case the requested `export` target is not in cache and an actual +analysis of the build has to be carried out) or specific commit (in case +a client asks for the tree of a given commit). However, it allows for +the natural workflow of keeping separate upstream repositories in +separate clones (updated in an appropriate way) without artificially +putting them in a single repository (as orphan branches). + +Supporting building against trees from CAS allows more flexibility in +defining roots that clients do not have to care about. In fact, they can +be defined in any way, as long as + + - the client is aware of the git tree identifier of the root, and + - some entity ensures the needed trees are known to the CAS. + +The auxiliary changes to `just-mr` described later in this document +provide one possible way to handle archives in this way. Moreover, this +additional flexibility will be necessary if we ever support computed +roots, i.e., roots that are the output of a `just` build. + +### Absent roots in `just` repository specification + +In order for `just` to know for which repositories to delegate the build +to the designated `just serve` endpoint, the repository configuration +for `just` can mark roots as absent; this is done by only giving the +type as `"git tree"` (or the corresponding ignore-special variant +thereof) and the tree identifier in the root specification, but no +witnessing repository. + +Any repository containing an absent root has to be content fixed, but +not all roots have to be absent (as `just` can always upload those trees +to CAS). It is an error if, outside the computations delegated to +`just serve`, a non-export target is requested from a repository +containing an absent root. Moreover, whenever there is a dependency on a +repository containing an absent root, a `just +serve` endpoint has to be specified in the invocation of `just`. + +### Auxiliary changes + +#### `just-mr` pragma `"absent"` + +For `just-mr` to know how to construct the repository description, +the description used by `just-mr` is extended. More precisely, a new +key `"absent"` is allowed in the `"pragma"` dictionary of a +repository description. If the specified value is true, `just-mr` +will generate an absent root out of this description, using all +available means to generate that root without ever having to fetch +the repository locally. In the typical case of a `git` repository, +the auxiliary `just serve` function to obtain the tree of a commit +is used. To allow this communication, `just-mr` also accepts the +arguments describing a `just serve` endpoint and forwards them as +early arguments to `just`, in the same way as it does with +`--local-build-root`. + +#### `just-mr` to inquire remote execution before fetching + +In line with the idea that fetching sources from upstream should +happen only once and not once per developer, we add remote execution +as another way of obtaining files to `just-mr`. More precisely, +`just-mr` will support the options `just` accepts to connect to the +remote CAS. When given, those will be forwarded to `just` as early +arguments (so that later `just`-only ones can override them); +moreover, when a file needed to set up a (present) root is found +neither in local CAS nor in one of the specified distdirs, `just-mr` +will first ask the remote CAS for the missing file before trying to +fetch itself from the specified URL. The rationale for this search +order is that the designated remote-execution service is typically +reachable over the network in a more reliable way than external +resources (while local resources do not require a network at all). + +#### `just-mr` to support new repository type `git tree` + +A new repository type is added to `just-mr`, called `git tree`. Such +a repository is given by + + - a `git` tree identifier, and + - a command that, when executed in an empty directory (anywhere in + the file system) will create in that directory a directory + structure containing the specified `git` tree (either top-level + or in some subdirectory). Moreover, that command does not modify + anything outside the directory it is called in; it is an error + if the specified tree is not created in this way. + +In this way, content-fixed repositories can be generated in a +generic way, e.g., using other version-control systems or +specialized artifact-fetching tools. + +Additionally, for archive-like repositories in the `just-mr` +repository specification (currently `archive` and `zip`), a `git` +tree identifier can be specified. If the tree is known to `just-mr`, +or the `"pragma"` `"absent"` is given, it will just use that tree. +Otherwise, it will fetch as usual, but error out if the obtained +tree is not the promised one after unpacking and taking the +specified subdirectory. In this way, also archives can be used as +absent roots. + +#### `just-mr fetch` to support storing in remote-execution CAS + +The `fetch` subcommand of `just-mr` will get an additional option to +support backing up the fetched information not to a local directory, +but instead to the CAS of the specified remote-execution endpoint. +This includes + + - all archives fetched, but also + - all trees computed in setting up the respective repository + description, both, from `git tree` repositories, as well as from + archives. + +In this way, `just-mr` can be used to fill the CAS from one central +point with all the information the clients need to treat all +content-fixed roots as absent. diff --git a/doc/future-designs/service-target-cache.org b/doc/future-designs/service-target-cache.org deleted file mode 100644 index 10138db5..00000000 --- a/doc/future-designs/service-target-cache.org +++ /dev/null @@ -1,227 +0,0 @@ -* Target-level caching as a service - -** Motivation - -Projects can have quite a lot of dependencies that are not part of -the build environment, but are, instead, built from source, e.g., -in order to always build against the latest snapshot. The latter -is a typical workflow in case of first-party dependencies. In the -case of ~justbuild~, those first-party dependencies form a separate -logical repository that is typically content fixed (e.g., because -that dependency is versioned in a ~git~ repository). - -Moreover, code is typically first built (and tested) by the owning -project before being used as a dependency. Therefore, if remote -execution is used, for a first-party dependency, we expect all -actions to be in cache. As dependencies are typically updated less -often than the code being developed is changed, in most builds, -the dependencies are in target-level cache. In other words, in a -remote-execution setup, the whole code of dependencies is fetched -just to walk through the action graph a single time to get the -necessary cache hits. - -** Proposal: target-level caching as a service - -To avoid these unnecessary fetches, we add a new subcommand ~just -serve~ that starts a service that provides the dependencies. This -typically happens by looking up a target-level cache entry. If the -entry, however, is not in cache, this also includes building the -respective ~export~ target using an associated remote-execution -end point. - -*** Scope: eligible ~export~ targets - -In order to typically have requests in cache, ~just serve~ will -refuse to handle requests that do not refer to ~export~ targets -in content-fixed repositories; recall that for a repository to be -content fixed, so have to be all repositories reachable from there. - -*** Communication through an associated remote-execution service - -Each ~just serve~ endpoint is always associated with a remote-execution -endpoint. All artifacts exchanged between client and ~just serve~ -endpoint are exchanged via the CAS that is part in the associated -remote-execution endpoint. This remote-execution endpoint is also -used if ~just serve~ has to build targets. - -The associated remote-execution endpoint can well be the same -process simultaneously acting as ~just execute~. In fact, this is -the default if no remote-execution endpoint is specified. - -*** Protocol - -Communication is handled via ~grpc~ exchanging ~proto~ buffers -containing the information described in the rest of this section. - -**** Main request and answer format - -A request is given by -- the map of remote-execution properties for the designated - remote-execution endpoint; together with the knowledge on the fixed - endpoint, the ~just serve~ instance can compute the target-level - cache shard, and -- the identifier of the target-level cache key; it is the client's - responsibility to ensure that the referred blob (i.e., the - JSON object with appropriate values for the keys ~"repo_key"~, - ~"target_name"~, and ~"effective_config"~) as well as the - indirectly referred repository description (the JSON object the - ~"repo_key"~ in the cache key refers to) are uploaded to CAS (of - the designated remote-execution endpoint) beforehand. - -The answer to that request is the identifier of the corresponding -target-level cache value (in the same format as for local target-level -caching). The ~just serve~ instance will ensure that the actual -value, as well as any directly or indirectly referenced artifacts -are available in the respective remote-execution CAS. Alternatively, -the answer can indicate the kind of error (unknown root, not an -export target, build failure, etc). - -**** Auxiliary request: tree of a commit - -As for ~git~ repositories, it is common to specify a commit in order -to fix a dependency (even though the corresponding tree identifier -would be enough). Moreover, the standard ~git~ protocol supports -asking for the commit of a given remote branch, but additional -overhead is needed in order to get the tree identifier. - -Therefore, in order to support clients (or, more precisely, ~just-mr~ -instances setting up the repository description) in constructing an -appropriate request for ~just serve~ without unnecessary overhead, -~just serve~ will support a second kind of request, where the -client request consists of a ~git~ commit identifier and the server -answers with the tree identifier for that commit if it is aware of -that commit, or indicates that it is not aware of that commit. - -**** Auxiliary request: describe - -To support ~just describe~ also in the cases where code is -delegated to the ~just serve~ endpoint, an additional request for -the ~describe~ information of a target can be requested; as ~just -serve~ only handles ~export~ targets, this target necessarily has -to be an export target. - -The request is given by the identifier of the target-level cache -key, again with the promise that the referred blob is available -in CAS. The answer is the identifier of a blob containing a JSON -object with the needed information, i.e., those parts of the target -description that are used by ~just describe~. Alternatively, the -answer may indicate the kind of error (unknown root, not an export -target, etc). - -*** Sources: local git repositories and remote trees - -A ~just serve~ instance takes roots from various sources, -- the ~git~ repository contained in the local build root, -- additional ~git~ repositories, optionally specified in the - invocation, and -- as last resort, asking the CAS in the designated remote-execution - service for the specified ~git~ tree. - -Allowing a list of repositories to take as sources (rather than -a single one) increases the effort when having to search for a -specified tree (in case the requested ~export~ target is not in -cache and an actual analysis of the build has to be carried out) -or specific commit (in case a client asks for the tree of a given -commit). However, it allows for the natural workflow of keeping -separate upstream repositories in separate clones (updated in an -appropriate way) without artificially putting them in a single -repository (as orphan branches). - -Supporting building against trees from CAS allows more flexibility -in defining roots that clients do not have to care about. In fact, -they can be defined in any way, as long as -- the client is aware of the git tree identifier of the root, and -- some entity ensures the needed trees are known to the CAS. -The auxiliary changes to ~just-mr~ described later in this document -provide one possible way to handle archives in this way. Moreover, -this additional flexibility will be necessary if we ever support -computed roots, i.e., roots that are the output of a ~just~ build. - -*** Absent roots in ~just~ repository specification - -In order for ~just~ to know for which repositories to delegate -the build to the designated ~just serve~ endpoint, the repository -configuration for ~just~ can mark roots as absent; this is done -by only giving the type as ~"git tree"~ (or the corresponding -ignore-special variant thereof) and the tree identifier in the root -specification, but no witnessing repository. - -Any repository containing an absent root has to be content fixed, -but not all roots have to be absent (as ~just~ can always upload -those trees to CAS). It is an error if, outside the computations -delegated to ~just serve~, a non-export target is requested from a -repository containing an absent root. Moreover, whenever there is -a dependency on a repository containing an absent root, a ~just -serve~ endpoint has to be specified in the invocation of ~just~. - -*** Auxiliary changes - -**** ~just-mr~ pragma ~"absent"~ - -For ~just-mr~ to know how to construct the repository description, -the description used by ~just-mr~ is extended. More precisely, a -new key ~"absent"~ is allowed in the ~"pragma"~ dictionary of a -repository description. If the specified value is true, ~just-mr~ -will generate an absent root out of this description, using all -available means to generate that root without ever having to fetch -the repository locally. In the typical case of a ~git~ repository, -the auxiliary ~just serve~ function to obtain the tree of a commit -is used. To allow this communication, ~just-mr~ also accepts the -arguments describing a ~just serve~ endpoint and forwards them -as early arguments to ~just~, in the same way as it does with -~--local-build-root~. - -**** ~just-mr~ to inquire remote execution before fetching - -In line with the idea that fetching sources from upstream should -happen only once and not once per developer, we add remote execution -as another way of obtaining files to ~just-mr~. More precisely, -~just-mr~ will support the options ~just~ accepts to connect to -the remote CAS. When given, those will be forwarded to ~just~ -as early arguments (so that later ~just~-only ones can override -them); moreover, when a file needed to set up a (present) root is -found neither in local CAS nor in one of the specified distdirs, -~just-mr~ will first ask the remote CAS for the missing file before -trying to fetch itself from the specified URL. The rationale for -this search order is that the designated remote-execution service -is typically reachable over the network in a more reliable way than -external resources (while local resources do not require a network -at all). - -**** ~just-mr~ to support new repository type ~git tree~ - -A new repository type is added to ~just-mr~, called ~git tree~. -Such a repository is given by -- a ~git~ tree identifier, and -- a command that, when executed in an empty directory (anywhere - in the file system) will create in that directory a directory - structure containing the specified ~git~ tree (either top-level - or in some subdirectory). Moreover, that command does not modify - anything outside the directory it is called in; it is an error - if the specified tree is not created in this way. -In this way, content-fixed repositories can be generated in a -generic way, e.g., using other version-control systems or specialized -artifact-fetching tools. - -Additionally, for archive-like repositories in the ~just-mr~ -repository specification (currently ~archive~ and ~zip~), a ~git~ -tree identifier can be specified. If the tree is known to ~just-mr~, -or the ~"pragma"~ ~"absent"~ is given, it will just use that tree. -Otherwise, it will fetch as usual, but error out if the obtained -tree is not the promised one after unpacking and taking the specified -subdirectory. In this way, also archives can be used as absent roots. - -**** ~just-mr fetch~ to support storing in remote-execution CAS - -The ~fetch~ subcommand of ~just-mr~ will get an additional option to -support backing up the fetched information not to a local directory, -but instead to the CAS of the specified remote-execution endpoint. -This includes -- all archives fetched, but also -- all trees computed in setting up the respective repository - description, both, from ~git tree~ repositories, as well as - from archives. - -In this way, ~just-mr~ can be used to fill the CAS from one central -point with all the information the clients need to treat all -content-fixed roots as absent. diff --git a/doc/future-designs/symlinks.md b/doc/future-designs/symlinks.md new file mode 100644 index 00000000..05215030 --- /dev/null +++ b/doc/future-designs/symlinks.md @@ -0,0 +1,113 @@ +Symbolic links +============== + +Background +---------- + +Besides files and directories, symbolic links are also an important +entity in the file system. Also `git` natively supports symbolic links +as entries in a tree object. Technically, a symbolic link is a string +that can be read via `readlink(2)`. However, they can also be followed +and functions to access a file, like `open(2)` do so by default. When +following a symbolic link, both, relative and absolute, names can be +used. + +Symbolic links in build systems +------------------------------- + +### Follow and reading both happen + +Compilers usually follow symlinks for all inputs. Archivers (like +`tar(1)` and package-building tools) usually read the link in order to +package the link itself, rather than the file referred to (if any). As a +generic build system, it is desirable to not have to make assumptions on +the intention of the program called (and hence the way it deals with +symlinks). This, however, has the consequence that only symbolic links +themselves can properly model symbolic links. + +### Self-containedness and location-independence of roots + +From a build-system perspective, a root should be self-contained; in +fact, the target-level caching assumes that the git tree identifier +entirely describes a `git`-tree root. For this to be true, such a root +has to be both, self contained and independent of its (assumed) location +in the file system. In particular, we can neither allow absolute +symbolic links (as they, depending on the assumed location, might point +out of the root), nor relative symbolic links that go upwards (via a +`../` reference) too far. + +### Symbolic links in actions + +Like for source roots, we understand action directories as self +contained and independent of their location in the file system. +Therefore, we have to require the same restrictions there as well, i.e., +neither absolute symbolic links nor relative symbolic links going up too +far. + +Allowing all relative symbolic links that don't point outside the +action directory, however, poses an additional layer of complications in +the definition of actions: a string might be allowed as symlink in some +places in the action directory, but not in others; in particular, we +can't tell only from the information that an artifact is a relative +symlink whether it can be safely placed at a particular location in an +action or not. Similarly for trees for which we only know that they +might contain relative symbolic links. + +### Presence of symbolic links in system source trees + +It can be desirable to use system libraries or tools as dependencies. A +typical use case, but not the only one, is packaging a tool for a +distribution. An obvious approach is to declare a system directory as a +root of a repository (providing the needed target files in a separate +root). As it turns out, however, those system directories do contain +symbolic links, e.g., shared libraries pointing to the specific version +(like `libfoo.so.3` as a symlink pointing to `libfoo.so.3.1.4`) or +detours through `/etc/alternatives`. + +Implemented stop-gap: "shopping list" for bootstrapping +--------------------------------------------------------- + +As a stop-gap measure to support building the tool itself against +pre-installed dependencies with the respective directories containing +symbolic links, or tools (like `protoc`) being symbolic links (e.g., to +the specific version), repositories can specify, in the `"copy"` +attribute of the `"local_bootstrap"` parameter, a list of files and +directories to be copied as part of the bootstrapping process to a fresh +clean directory serving as root; during this copying, symlinks are +followed. + +Proposed treatment of symbolic links +------------------------------------ + +### "Ignore-special" roots + +To allow working with source trees containing symbolic links, we extend +the existing roots by "ignore-special" versions thereof. In such a +root (regardless whether file based, or `git`-tree based), everything +not a file or a directory will be pretended to be absent. For any +compile-like tasks, the effect of symlinks can be modeled by appropriate +staging. + +As certain entries have to be ignored, source trees can only be obtained +by traversing the respective tree; in particular, the `TREE` reference +is no longer constant time on those roots, even if `git`-tree based. +Nevertheless, for `git`-tree roots, the effective tree is a function of +the `git`-tree of the root, so `git`-tree-based ignore-special roots are +content fixed and hence eligible for target-level caching. + +### Accepting non-upwards relative symlinks as first-class objects + +Finally, a restricted form of symlinks, more precisely relative +non-upwards symbolic links, will be added as first-class object. That +is, a new artifact type (besides blobs and trees) for relative +non-upwards symbolic links is added. Like any other artifact they can be +freely placed into the inputs of an action, as well as in artifacts, +runfiles, or provides map of a target. Artifacts of this new type can be +defined as + + - source-symlink reference, as well as implicitly as part of a source + tree, + - as a symlink output of an action, as well as implicitly as part of a + tree output of an action, and + - explicitly in the rule language from a string through a new + `SYMLINK` constructor function. diff --git a/doc/future-designs/symlinks.org b/doc/future-designs/symlinks.org deleted file mode 100644 index 47ca5063..00000000 --- a/doc/future-designs/symlinks.org +++ /dev/null @@ -1,108 +0,0 @@ -* Symbolic links - -** Background - -Besides files and directories, symbolic links are also an important -entity in the file system. Also ~git~ natively supports symbolic -links as entries in a tree object. Technically, a symbolic link -is a string that can be read via ~readlink(2)~. However, they can -also be followed and functions to access a file, like ~open(2)~ do -so by default. When following a symbolic link, both, relative and -absolute, names can be used. - -** Symbolic links in build systems - -*** Follow and reading both happen - -Compilers usually follow symlinks for all inputs. Archivers (like -~tar(1)~ and package-building tools) usually read the link in order -to package the link itself, rather than the file referred to (if -any). As a generic build system, it is desirable to not have to make -assumptions on the intention of the program called (and hence the -way it deals with symlinks). This, however, has the consequence that -only symbolic links themselves can properly model symbolic links. - -*** Self-containedness and location-independence of roots - -From a build-system perspective, a root should be self-contained; in -fact, the target-level caching assumes that the git tree identifier -entirely describes a ~git~-tree root. For this to be true, such a -root has to be both, self contained and independent of its (assumed) -location in the file system. In particular, we can neither allow -absolute symbolic links (as they, depending on the assumed location, -might point out of the root), nor relative symbolic links that go -upwards (via a ~../~ reference) too far. - -*** Symbolic links in actions - -Like for source roots, we understand action directories as self -contained and independent of their location in the file system. -Therefore, we have to require the same restrictions there as well, -i.e., neither absolute symbolic links nor relative symbolic links -going up too far. - -Allowing all relative symbolic links that don't point outside the -action directory, however, poses an additional layer of complications -in the definition of actions: a string might be allowed as symlink -in some places in the action directory, but not in others; in -particular, we can't tell only from the information that an artifact -is a relative symlink whether it can be safely placed at a particular -location in an action or not. Similarly for trees for which we only -know that they might contain relative symbolic links. - -*** Presence of symbolic links in system source trees - -It can be desirable to use system libraries or tools as dependencies. -A typical use case, but not the only one, is packaging a tool for a -distribution. An obvious approach is to declare a system directory -as a root of a repository (providing the needed target files in a -separate root). As it turns out, however, those system directories -do contain symbolic links, e.g., shared libraries pointing to -the specific version (like ~libfoo.so.3~ as a symlink pointing to -~libfoo.so.3.1.4~) or detours through ~/etc/alternatives~. - -** Implemented stop-gap: "shopping list" for bootstrapping - -As a stop-gap measure to support building the tool itself against -pre-installed dependencies with the respective directories containing -symbolic links, or tools (like ~protoc~) being symbolic links (e.g., -to the specific version), repositories can specify, in the ~"copy"~ -attribute of the ~"local_bootstrap"~ parameter, a list of files -and directories to be copied as part of the bootstrapping process -to a fresh clean directory serving as root; during this copying, -symlinks are followed. - -** Proposed treatment of symbolic links - -*** "Ignore-special" roots - -To allow working with source trees containing symbolic links, we -extend the existing roots by "ignore-special" versions thereof. In -such a root (regardless whether file based, or ~git~-tree based), -everything not a file or a directory will be pretended to be absent. -For any compile-like tasks, the effect of symlinks can be modeled -by appropriate staging. - -As certain entries have to be ignored, source trees can only be -obtained by traversing the respective tree; in particular, the -~TREE~ reference is no longer constant time on those roots, even -if ~git~-tree based. Nevertheless, for ~git~-tree roots, the -effective tree is a function of the ~git~-tree of the root, so -~git~-tree-based ignore-special roots are content fixed and hence -eligible for target-level caching. - -*** Accepting non-upwards relative symlinks as first-class objects - -Finally, a restricted form of symlinks, more precisely relative -non-upwards symbolic links, will be added as first-class object. -That is, a new artifact type (besides blobs and trees) for relative -non-upwards symbolic links is added. Like any other artifact they -can be freely placed into the inputs of an action, as well as in -artifacts, runfiles, or provides map of a target. Artifacts of this -new type can be defined as -- source-symlink reference, as well as implicitly as part of a - source tree, -- as a symlink output of an action, as well as implicitly as part - of a tree output of an action, and -- explicitly in the rule language from a string through a new - ~SYMLINK~ constructor function. diff --git a/doc/specification/remote-protocol.md b/doc/specification/remote-protocol.md new file mode 100644 index 00000000..1afd7e32 --- /dev/null +++ b/doc/specification/remote-protocol.md @@ -0,0 +1,145 @@ +Specification of the just Remote Execution Protocol +=================================================== + +Introduction +------------ + +just supports remote execution of actions across multiple machines. As +such, it makes use of a remote execution protocol. The basis of our +protocol is the open-source gRPC [remote execution +API](https://github.com/bazelbuild/remote-apis/blob/main/build/bazel/remote/execution/v2/remote_execution.proto). +We use this protocol in a **compatible** mode, but by default, we use a +modified version, allowing us to pass git trees and files directly +without even looking at their content or traversing them. This +modification makes sense since it is more efficient if sources are +available in git repositories and much open-source code is hosted in git +repositories. With this protocol, we take advantage of already hashed +git content as much as possible by avoiding unnecessary conversion and +communication overhead. + +In the following sections, we explain which modifications we applied to +the original protocol and which requirements we have to the remote +execution service to seamlessly work with just. + +just Protocol Description +------------------------- + +### git Blob and Tree Hashes + +In order to be able work with git hashes, both client side as well as +server side need to be extended to support the regular git hash +functions for blobs and trees: + +The hash of a blob is computed as + + sha1sum(b"blob <size_of_content>\0<content>") + +The hash of a tree is computed as + + sha1sum(b"tree <size_of_entries>\0<entries>") + +where `<entries>` is a sequence (without newlines) of `<entry>`, and +each `<entry>` is + + <mode> <file or dir name>\0<git-hash of the corresponding blob or tree> + +`<mode>` is a number defining if the object is a file (`100644`), an +executable file (`100755`), a tree (`040000`), or a symbolic link +(`120000`). More information on how git internally stores its objects +can be found in the official [git +documentation](https://git-scm.com/book/en/v2/git-Internals-git-Objects). + +Since git hashes blob content differently from trees, this type of +information has to be transmitted in addition to the content and the +hash. To this aim, just prefixes the git hash values passed over the +wire with a single-byte marker. Thus allowing the remote side to +distinguish a blob from a tree without inspecting the (potentially +large) content. The markers are + + - `0x62` for a git blob (`0x62` corresponds to the character `b`) + - `0x74` for a git tree (`0x74` corresponds to the character `t`) + +Since hashes are transmitted as hexadecimal string, the resulting length +of such prefixed git hashes is 42 characters. The server side has to +accept this hash length as valid hash length to detect our protocol and +to apply the according git hash functions based on the detected prefix. + +### Blob and Tree Availability + +Typically, it makes sense for a client to check the availability of a +blob or a tree at the remote side, before it actually uploads it. Thus, +the remote side should be able to answer availability requests based on +our prefixed hash values. + +### Blob Upload + +A blob is uploaded to the remote side by passing its raw content as well +as its `Digest` containing the git hash value for a blob prefixed by +`0x62`. The remote side needs to verify the received content by applying +the git blob hash function to it, before the blob is stored in the +content addressable storage (CAS). + +If a blob is part of git repository and already known to the remote +side, we even do not have to calculate the hash value from a possible +large file, instead we can directly use the hash value calculated by git +and pass it through. + +### Tree Upload + +In contrast to regular files, which are uploaded as blobs, the original +protocol has no notion of directories on the remote side. Thus, +directories need to be traversed and converted to `Directory` Protobuf +messages, which are then serialized and uploaded as blobs. + +In our modified protocol, we prevent this traversing and conversion +overhead by directly uploading the git tree objects instead of the +serialized Protobuf messages if the directory is part of a git +repository. Consequently, we can also reuse the corresponding git hash +value for a tree object, which just needs to be prefixed by `74`, when +uploaded. + +The remote side must accepts git tree objects instead `Directory` +Protobuf messages at any location where `Directory` messages are +referred (e.g., the root directory of an action). The tree content is +verified using the git hash function for trees. In addition, it has to +be modified to parse the git tree object format. + +Using this git tree representation makes tree handling much more +efficient, since the effort of traversing and uploading the content of a +git tree occurs only once and for each subsequent request, we directly +pass around the git tree id. We require the invariant that if a tree is +part of any CAS then all its content is also available in this CAS. To +adhere to this invariant, the client side has to prove that the content +of a tree is available in the CAS, before uploading this tree. One way +to ensure that the tree content is known to the remote side is that it +is uploaded by the client. The server side has to ensure this invariant +holds. In particular, if the remote side implements any sort of pruning +strategy for the CAS, it has to honor this invariant when an element got +pruned. + +Another consequence of this efficient tree handling is that it improves +**action digest** calculation noticeably, since known git trees referred +by the root directory do not need to be traversed. This in turn allows +to faster determine whether an action result is already available in the +action cache or not. + +### Tree Download + +Once an action is successfully executed, it might have generated output +files or output directories in its staging area on the remote side. Each +output file needs to be uploaded to its CAS with the corresponding git +blob hash. Each output directory needs to be translated to a git tree +object and uploaded to the CAS with the corresponding git tree hash. +Only if the content of a tree is available in the CAS, the server side +is allowed to return the tree to the client. + +In case of a generated output directory, the server only returns the +corresponding git tree id to the client instead of a flat list of all +recursively generated output directories as part of a `Tree` Protobuf +message as it is done in the original protocol. The remote side promises +that each blob and subtree contained in the root tree is available in +the remote CAS. Such blobs and trees must be accessible, using the +streaming interface, without specifying the size (since sizes are not +stored in a git tree). Due to the Protobuf 3 specification, which is +used in this remote execution API, not specifying the size means the +default value 0 is used. diff --git a/doc/specification/remote-protocol.org b/doc/specification/remote-protocol.org deleted file mode 100644 index dea7177e..00000000 --- a/doc/specification/remote-protocol.org +++ /dev/null @@ -1,139 +0,0 @@ -* Specification of the just Remote Execution Protocol - -** Introduction - -just supports remote execution of actions across multiple machines. As such, it -makes use of a remote execution protocol. The basis of our protocol is the -open-source gRPC -[[https://github.com/bazelbuild/remote-apis/blob/main/build/bazel/remote/execution/v2/remote_execution.proto][remote -execution API]]. We use this protocol in a *compatible* mode, but by default, we -use a modified version, allowing us to pass git trees and files directly without -even looking at their content or traversing them. This modification makes sense -since it is more efficient if sources are available in git repositories and much -open-source code is hosted in git repositories. With this protocol, we take -advantage of already hashed git content as much as possible by avoiding -unnecessary conversion and communication overhead. - -In the following sections, we explain which modifications we applied to the -original protocol and which requirements we have to the remote execution service -to seamlessly work with just. - - -** just Protocol Description - -*** git Blob and Tree Hashes - -In order to be able work with git hashes, both client side as well as server -side need to be extended to support the regular git hash functions for blobs and -trees: - -The hash of a blob is computed as -#+BEGIN_SRC -sha1sum(b"blob <size_of_content>\0<content>") -#+END_SRC -The hash of a tree is computed as -#+BEGIN_SRC -sha1sum(b"tree <size_of_entries>\0<entries>") -#+END_SRC -where ~<entries>~ is a sequence (without newlines) of ~<entry>~, and each -~<entry>~ is -#+BEGIN_SRC -<mode> <file or dir name>\0<git-hash of the corresponding blob or tree> -#+END_SRC -~<mode>~ is a number defining if the object is a file (~100644~), an executable -file (~100755~), a tree (~040000~), or a symbolic link (~120000~). More -information on how git internally stores its objects can be found in the -official [[https://git-scm.com/book/en/v2/git-Internals-git-Objects][git -documentation]]. - -Since git hashes blob content differently from trees, this type of information -has to be transmitted in addition to the content and the hash. To this aim, just -prefixes the git hash values passed over the wire with a single-byte marker. -Thus allowing the remote side to distinguish a blob from a tree without -inspecting the (potentially large) content. The markers are - -- ~0x62~ for a git blob (~0x62~ corresponds to the character ~b~) -- ~0x74~ for a git tree (~0x74~ corresponds to the character ~t~) - -Since hashes are transmitted as hexadecimal string, the resulting length of such -prefixed git hashes is 42 characters. The server side has to accept this hash -length as valid hash length to detect our protocol and to apply the according -git hash functions based on the detected prefix. - - -*** Blob and Tree Availability - -Typically, it makes sense for a client to check the availability of a blob or a -tree at the remote side, before it actually uploads it. Thus, the remote side -should be able to answer availability requests based on our prefixed hash -values. - - -*** Blob Upload - -A blob is uploaded to the remote side by passing its raw content as well as its -~Digest~ containing the git hash value for a blob prefixed by ~0x62~. The remote -side needs to verify the received content by applying the git blob hash function -to it, before the blob is stored in the content addressable storage (CAS). - -If a blob is part of git repository and already known to the remote side, we -even do not have to calculate the hash value from a possible large file, instead -we can directly use the hash value calculated by git and pass it through. - - -*** Tree Upload - -In contrast to regular files, which are uploaded as blobs, the original protocol -has no notion of directories on the remote side. Thus, directories need to be -traversed and converted to ~Directory~ Protobuf messages, which are then -serialized and uploaded as blobs. - -In our modified protocol, we prevent this traversing and conversion overhead by -directly uploading the git tree objects instead of the serialized Protobuf -messages if the directory is part of a git repository. Consequently, we can also -reuse the corresponding git hash value for a tree object, which just needs to be -prefixed by ~74~, when uploaded. - -The remote side must accepts git tree objects instead ~Directory~ Protobuf -messages at any location where ~Directory~ messages are referred (e.g., the root -directory of an action). The tree content is verified using the git hash -function for trees. In addition, it has to be modified to parse the git tree -object format. - -Using this git tree representation makes tree handling much more efficient, -since the effort of traversing and uploading the content of a git tree occurs -only once and for each subsequent request, we directly pass around the git tree -id. We require the invariant that if a tree is part of any CAS then all its -content is also available in this CAS. To adhere to this invariant, the client -side has to prove that the content of a tree is available in the CAS, before -uploading this tree. One way to ensure that the tree content is known to the -remote side is that it is uploaded by the client. The server side has to ensure -this invariant holds. In particular, if the remote side implements any sort of -pruning strategy for the CAS, it has to honor this invariant when an element got -pruned. - -Another consequence of this efficient tree handling is that it improves *action -digest* calculation noticeably, since known git trees referred by the root -directory do not need to be traversed. This in turn allows to faster determine -whether an action result is already available in the action cache or not. - - -*** Tree Download - -Once an action is successfully executed, it might have generated output files or -output directories in its staging area on the remote side. Each output file -needs to be uploaded to its CAS with the corresponding git blob hash. Each -output directory needs to be translated to a git tree object and uploaded to the -CAS with the corresponding git tree hash. Only if the content of a tree is -available in the CAS, the server side is allowed to return the tree to the -client. - -In case of a generated output directory, the server only returns the -corresponding git tree id to the client instead of a flat list of all -recursively generated output directories as part of a ~Tree~ Protobuf message as -it is done in the original protocol. The remote side promises that each blob and -subtree contained in the root tree is available in the remote CAS. Such blobs -and trees must be accessible, using the streaming interface, without specifying -the size (since sizes are not stored in a git tree). Due to the Protobuf 3 -specification, which is used in this remote execution API, not specifying the -size means the default value 0 is used. diff --git a/doc/tutorial/getting-started.md b/doc/tutorial/getting-started.md new file mode 100644 index 00000000..36a57d26 --- /dev/null +++ b/doc/tutorial/getting-started.md @@ -0,0 +1,217 @@ +Getting Started +=============== + +In order to use *justbuild*, first make sure that `just`, `just-mr`, and +`just-import-git` are available in your `PATH`. + +Creating a new project +---------------------- + +*justbuild* needs to know the root of the project worked on. By default, +it searches upwards from the current directory till it finds a marker. +Currently, we support three different markers: the files `ROOT` and +`WORKSPACE` or the directory `.git`. Lets create a new project by +creating one of those markers: + +``` sh +$ touch ROOT +``` + +Creating a generic target +------------------------- + +By default, targets are described in `TARGETS` files. These files +contain a `JSON` object with the target name as key and the target +description as value. A target description is an object with at least a +single mandatory field: `"type"`. This field specifies which rule +(built-in or user-defined) to apply for this target. + +A simple target that only executes commands can be created using the +built-in `"generic"` rule, which requires at least one command and one +output file or directory. To create such a target, create the file +`TARGETS` with the following content: + +``` {.jsonc srcname="TARGETS"} +{ "greeter": + { "type": "generic" + , "cmds": ["echo -n 'Hello ' > out.txt", "cat name.txt >> out.txt"] + , "outs": ["out.txt"] + , "deps": ["name.txt"] + } +} +``` + +In this example, the `"greeter"` target will run two commands to produce +the output file `out.txt`. The second command depends on the input file +`name.txt` that we need to create as well: + +``` sh +$ echo World > name.txt +``` + +Building a generic target +------------------------- + +To build a target, we need to run `just` with the subcommand `build`: + +``` sh +$ just build greeter +INFO: Requested target is [["@","","","greeter"],{}] +INFO: Analysed target [["@","","","greeter"],{}] +INFO: Export targets found: 0 cached, 0 uncached, 0 not eligible for caching +INFO: Discovered 1 actions, 0 trees, 0 blobs +INFO: Building [["@","","","greeter"],{}]. +INFO: Processed 1 actions, 0 cache hits. +INFO: Artifacts built, logical paths are: + out.txt [557db03de997c86a4a028e1ebd3a1ceb225be238:12:f] +$ +``` + +The subcommand `build` just builds the artifact but does not stage it to +any user-defined location on the file system. Instead it reports a +description of the artifact consisting of `git` blob identifier, size, +and type (in this case `f` for non-executable file). To also stage the +produced artifact to the working directory, use the `install` subcommand +and specify the output directory: + +``` sh +$ just install greeter -o . +INFO: Requested target is [["@","","","greeter"],{}] +INFO: Analysed target [["@","","","greeter"],{}] +INFO: Export targets found: 0 cached, 0 uncached, 0 not eligible for caching +INFO: Discovered 1 actions, 0 trees, 0 blobs +INFO: Building [["@","","","greeter"],{}]. +INFO: Processed 1 actions, 1 cache hits. +INFO: Artifacts can be found in: + /tmp/tutorial/out.txt [557db03de997c86a4a028e1ebd3a1ceb225be238:12:f] +$ cat out.txt +Hello World +$ +``` + +Note that the `install` subcommand initiates the build a second time, +without executing any actions as all actions are being served from +cache. The produced artifact is identical, which is indicated by the +same hash/size/type. + +If one is only interested in a single final artifact, one can also +request via the `-P` option that this artifact be written to standard +output after the build. As all messages are reported to standard error, +this can be used for both, interactively reading a text file, as well as +for piping the artifact to another program. + +``` sh +$ just build greeter -Pout.txt +INFO: Requested target is [["@","","","greeter"],{}] +INFO: Analysed target [["@","","","greeter"],{}] +INFO: Export targets found: 0 cached, 0 uncached, 0 not eligible for caching +INFO: Discovered 1 actions, 0 trees, 0 blobs +INFO: Building [["@","","","greeter"],{}]. +INFO: Processed 1 actions, 1 cache hits. +INFO: Artifacts built, logical paths are: + out.txt [557db03de997c86a4a028e1ebd3a1ceb225be238:12:f] +Hello World +$ +``` + +Alternatively, we could also directly request the artifact `out.txt` +from *justbuild*'s CAS (content-addressable storage) and print it on +the command line via: + +``` sh +$ just install-cas [557db03de997c86a4a028e1ebd3a1ceb225be238:12:f] +Hello World +$ +``` + +The canonical way of requesting an object from the CAS is, as just +shown, to specify the full triple of hash, size, and type, separated by +colons and enclosed in square brackets. To simplify usage, the brackets +can be omitted and the size and type fields have the default values `0` +and `f`, respectively. While the default value for the size is wrong for +all but one string, the hash still determines the content of the file +and hence the local CAS is still able to retrieve the file. So the +typical invocation would simply specify the hash. + +``` sh +$ just install-cas 557db03de997c86a4a028e1ebd3a1ceb225be238 +Hello World +$ +``` + +Targets versus Files: The Stage +------------------------------- + +When invoking the `build` command, we had to specify the target +`greeter`, not the output file `out.txt`. While other build systems +allow requests specifying an output file, for *justbuild* this would +conflict with a fundamental design principle: staging; each target has +its own logical output space, the "stage", where it can put its +artifacts. We can, without any problem, add a second target also +generating a file `out.txt`. + +``` {.jsonc srcname="TARGETS"} +... +, "upper": + { "type": "generic" + , "cmds": ["cat name.txt | tr a-z A-Z > out.txt"] + , "outs": ["out.txt"] + , "deps": ["name.txt"] + } +... +``` + +As we only request targets, no conflicts arise. + +``` sh +$ just build upper -P out.txt +INFO: Requested target is [["@","","","upper"],{}] +INFO: Analysed target [["@","","","upper"],{}] +INFO: Export targets found: 0 cached, 0 uncached, 0 not eligible for caching +INFO: Discovered 1 actions, 0 trees, 0 blobs +INFO: Building [["@","","","upper"],{}]. +INFO: Processed 1 actions, 0 cache hits. +INFO: Artifacts built, logical paths are: + out.txt [83cf24cdfb4891a36bee93421930dd220766299a:6:f] +WORLD +$ just build greeter -P out.txt +INFO: Requested target is [["@","","","greeter"],{}] +INFO: Analysed target [["@","","","greeter"],{}] +INFO: Export targets found: 0 cached, 0 uncached, 0 not eligible for caching +INFO: Discovered 1 actions, 0 trees, 0 blobs +INFO: Building [["@","","","greeter"],{}]. +INFO: Processed 1 actions, 1 cache hits. +INFO: Artifacts built, logical paths are: + out.txt [557db03de997c86a4a028e1ebd3a1ceb225be238:12:f] +Hello World +$ +``` + +While one normally tries to design targets in such a way that they +don't have conflicting files if they should be used together, it is up +to the receiving target to decide what to do with those artifacts. A +built-in rule allowing to rearrange artifacts is `"install"`; a detailed +description of this rule can be found in the documentation. In the +simple case of a target producing precisely one file, the argument +`"files"` can be used to map that file to a new location. + +``` {.jsonc srcname="TARGETS"} +... +, "both": + {"type": "install", "files": {"hello.txt": "greeter", "upper.txt": "upper"}} +... +``` + +``` sh +$ just build both +INFO: Requested target is [["@","","","both"],{}] +INFO: Analysed target [["@","","","both"],{}] +INFO: Export targets found: 0 cached, 0 uncached, 0 not eligible for caching +INFO: Discovered 2 actions, 0 trees, 0 blobs +INFO: Building [["@","","","both"],{}]. +INFO: Processed 2 actions, 2 cache hits. +INFO: Artifacts built, logical paths are: + hello.txt [557db03de997c86a4a028e1ebd3a1ceb225be238:12:f] + upper.txt [83cf24cdfb4891a36bee93421930dd220766299a:6:f] +$ +``` diff --git a/doc/tutorial/getting-started.org b/doc/tutorial/getting-started.org deleted file mode 100644 index 5a041397..00000000 --- a/doc/tutorial/getting-started.org +++ /dev/null @@ -1,212 +0,0 @@ -* Getting Started - -In order to use /justbuild/, first make sure that ~just~, ~just-mr~, and -~just-import-git~ are available in your ~PATH~. - -** Creating a new project - -/justbuild/ needs to know the root of the project worked on. By default, it -searches upwards from the current directory till it finds a marker. Currently, -we support three different markers: the files ~ROOT~ and ~WORKSPACE~ or the -directory ~.git~. Lets create a new project by creating one of those markers: - -#+BEGIN_SRC sh -$ touch ROOT -#+END_SRC - -** Creating a generic target - -By default, targets are described in ~TARGETS~ files. These files contain a -~JSON~ object with the target name as key and the target description as value. A -target description is an object with at least a single mandatory field: -~"type"~. This field specifies which rule (built-in or user-defined) to apply -for this target. - -A simple target that only executes commands can be created using the built-in -~"generic"~ rule, which requires at least one command and one output file or -directory. To create such a target, create the file ~TARGETS~ with the following -content: - -#+SRCNAME: TARGETS -#+BEGIN_SRC js -{ "greeter": - { "type": "generic" - , "cmds": ["echo -n 'Hello ' > out.txt", "cat name.txt >> out.txt"] - , "outs": ["out.txt"] - , "deps": ["name.txt"] - } -} -#+END_SRC - -In this example, the ~"greeter"~ target will run two commands to produce the -output file ~out.txt~. The second command depends on the input file ~name.txt~ -that we need to create as well: - -#+BEGIN_SRC sh -$ echo World > name.txt -#+END_SRC - -** Building a generic target - -To build a target, we need to run ~just~ with the subcommand ~build~: - -#+BEGIN_SRC sh -$ just build greeter -INFO: Requested target is [["@","","","greeter"],{}] -INFO: Analysed target [["@","","","greeter"],{}] -INFO: Export targets found: 0 cached, 0 uncached, 0 not eligible for caching -INFO: Discovered 1 actions, 0 trees, 0 blobs -INFO: Building [["@","","","greeter"],{}]. -INFO: Processed 1 actions, 0 cache hits. -INFO: Artifacts built, logical paths are: - out.txt [557db03de997c86a4a028e1ebd3a1ceb225be238:12:f] -$ -#+END_SRC - -The subcommand ~build~ just builds the artifact but does not stage it to any -user-defined location on the file system. Instead it reports a description -of the artifact consisting of ~git~ blob identifier, size, and type (in -this case ~f~ for non-executable file). To also stage the produced artifact to -the working directory, use the ~install~ subcommand and specify the output -directory: - -#+BEGIN_SRC sh -$ just install greeter -o . -INFO: Requested target is [["@","","","greeter"],{}] -INFO: Analysed target [["@","","","greeter"],{}] -INFO: Export targets found: 0 cached, 0 uncached, 0 not eligible for caching -INFO: Discovered 1 actions, 0 trees, 0 blobs -INFO: Building [["@","","","greeter"],{}]. -INFO: Processed 1 actions, 1 cache hits. -INFO: Artifacts can be found in: - /tmp/tutorial/out.txt [557db03de997c86a4a028e1ebd3a1ceb225be238:12:f] -$ cat out.txt -Hello World -$ -#+END_SRC - -Note that the ~install~ subcommand initiates the build a second time, without -executing any actions as all actions are being served from cache. The produced -artifact is identical, which is indicated by the same hash/size/type. - -If one is only interested in a single final artifact, one can -also request via the ~-P~ option that this artifact be written to -standard output after the build. As all messages are reported to -standard error, this can be used for both, interactively reading a -text file, as well as for piping the artifact to another program. - -#+BEGIN_SRC sh -$ just build greeter -Pout.txt -INFO: Requested target is [["@","","","greeter"],{}] -INFO: Analysed target [["@","","","greeter"],{}] -INFO: Export targets found: 0 cached, 0 uncached, 0 not eligible for caching -INFO: Discovered 1 actions, 0 trees, 0 blobs -INFO: Building [["@","","","greeter"],{}]. -INFO: Processed 1 actions, 1 cache hits. -INFO: Artifacts built, logical paths are: - out.txt [557db03de997c86a4a028e1ebd3a1ceb225be238:12:f] -Hello World -$ -#+END_SRC - -Alternatively, we could also directly request the artifact ~out.txt~ from -/justbuild/'s CAS (content-addressable storage) and print it on the command line -via: - -#+BEGIN_SRC sh -$ just install-cas [557db03de997c86a4a028e1ebd3a1ceb225be238:12:f] -Hello World -$ -#+END_SRC - -The canonical way of requesting an object from the CAS is, as just shown, to -specify the full triple of hash, size, and type, separated by colons and -enclosed in square brackets. To simplify usage, the brackets can be omitted -and the size and type fields have the default values ~0~ and ~f~, respectively. -While the default value for the size is wrong for all but one string, the hash -still determines the content of the file and hence the local CAS is still -able to retrieve the file. So the typical invocation would simply specify the -hash. - -#+BEGIN_SRC sh -$ just install-cas 557db03de997c86a4a028e1ebd3a1ceb225be238 -Hello World -$ -#+END_SRC - -** Targets versus Files: The Stage - -When invoking the ~build~ command, we had to specify the target ~greeter~, -not the output file ~out.txt~. While other build systems allow requests -specifying an output file, for /justbuild/ this would conflict with a -fundamental design principle: staging; each target has its own logical -output space, the "stage", where it can put its artifacts. We can, without -any problem, add a second target also generating a file ~out.txt~. - -#+SRCNAME: TARGETS -#+BEGIN_SRC js -... -, "upper": - { "type": "generic" - , "cmds": ["cat name.txt | tr a-z A-Z > out.txt"] - , "outs": ["out.txt"] - , "deps": ["name.txt"] - } -... -#+END_SRC - -As we only request targets, no conflicts arise. - -#+BEGIN_SRC sh -$ just build upper -P out.txt -INFO: Requested target is [["@","","","upper"],{}] -INFO: Analysed target [["@","","","upper"],{}] -INFO: Export targets found: 0 cached, 0 uncached, 0 not eligible for caching -INFO: Discovered 1 actions, 0 trees, 0 blobs -INFO: Building [["@","","","upper"],{}]. -INFO: Processed 1 actions, 0 cache hits. -INFO: Artifacts built, logical paths are: - out.txt [83cf24cdfb4891a36bee93421930dd220766299a:6:f] -WORLD -$ just build greeter -P out.txt -INFO: Requested target is [["@","","","greeter"],{}] -INFO: Analysed target [["@","","","greeter"],{}] -INFO: Export targets found: 0 cached, 0 uncached, 0 not eligible for caching -INFO: Discovered 1 actions, 0 trees, 0 blobs -INFO: Building [["@","","","greeter"],{}]. -INFO: Processed 1 actions, 1 cache hits. -INFO: Artifacts built, logical paths are: - out.txt [557db03de997c86a4a028e1ebd3a1ceb225be238:12:f] -Hello World -$ -#+END_SRC - -While one normally tries to design targets in such a way that they -don't have conflicting files if they should be used together, it is -up to the receiving target to decide what to do with those artifacts. -A built-in rule allowing to rearrange artifacts is ~"install"~; a -detailed description of this rule can be found in the documentation. -In the simple case of a target producing precisely one file, the -argument ~"files"~ can be used to map that file to a new location. - -#+SRCNAME: TARGETS -#+BEGIN_SRC js -... -, "both": - {"type": "install", "files": {"hello.txt": "greeter", "upper.txt": "upper"}} -... -#+END_SRC - -#+BEGIN_SRC sh -$ just build both -INFO: Requested target is [["@","","","both"],{}] -INFO: Analysed target [["@","","","both"],{}] -INFO: Export targets found: 0 cached, 0 uncached, 0 not eligible for caching -INFO: Discovered 2 actions, 0 trees, 0 blobs -INFO: Building [["@","","","both"],{}]. -INFO: Processed 2 actions, 2 cache hits. -INFO: Artifacts built, logical paths are: - hello.txt [557db03de997c86a4a028e1ebd3a1ceb225be238:12:f] - upper.txt [83cf24cdfb4891a36bee93421930dd220766299a:6:f] -$ -#+END_SRC diff --git a/doc/tutorial/hello-world.md b/doc/tutorial/hello-world.md new file mode 100644 index 00000000..9af68f07 --- /dev/null +++ b/doc/tutorial/hello-world.md @@ -0,0 +1,379 @@ +Building C++ Hello World +======================== + +*justbuild* is a true language-agnostic (there are no more-equal +languages) and multi-repository build system. As a consequence, +high-level concepts (e.g., C++ binaries, C++ libraries, etc.) are not +hardcoded built-ins of the tool, but rather provided via a set of rules. +These rules can be specified as a true dependency to your project like +any other external repository your project might depend on. + +Setting up the Multi-Repository Configuration +--------------------------------------------- + +To build a project with multi-repository dependencies, we first need to +provide a configuration that declares the required repositories. Before +we begin, we need to declare where the root of our workspace is located +by creating an empty file `ROOT`: + +``` sh +$ touch ROOT +``` + +Second, we also need to create the multi-repository configuration +`repos.json` in the workspace root: + +``` {.jsonc srcname="repos.json"} +{ "main": "tutorial" +, "repositories": + { "rules-cc": + { "repository": + { "type": "git" + , "branch": "master" + , "commit": "123d8b03bf2440052626151c14c54abce2726e6f" + , "repository": "https://github.com/just-buildsystem/rules-cc.git" + , "subdir": "rules" + } + } + , "tutorial": + { "repository": {"type": "file", "path": "."} + , "bindings": {"rules": "rules-cc"} + } + } +} +``` + +In that configuration, two repositories are defined: + +1. The `"rules-cc"` repository located in the subdirectory `rules` of + [just-buildsystem/rules-cc:123d8b03bf2440052626151c14c54abce2726e6f](https://github.com/just-buildsystem/rules-cc/tree/123d8b03bf2440052626151c14c54abce2726e6f), + which contains the high-level concepts for building C/C++ binaries + and libraries. + +2. The `"tutorial"` repository located at `.`, which contains the + targets that we want to build. It has a single dependency, which is + the *rules* that are needed to build the target. These rules are + bound via the open name `"rules"` to the just created repository + `"rules-cc"`. In this way, the entities provided by `"rules-cc"` can + be accessed from within the `"tutorial"` repository via the + fully-qualified name `["@", "rules", "<module>", "<name>"]`; + fully-qualified names (for rules, targets to build (like libraries, + binaries), etc) are given by a repository name, a path specifying a + directory within that repository (the "module") where the + specification file is located, and a symbolic name (i.e., an + arbitrary string that is used as key in the specification). + +The final repository configuration contains a single `JSON` object with +the key `"repositories"` referring to an object of repository names as +keys and repository descriptions as values. For convenience, the main +repository to pick is set to `"tutorial"`. + +Description of the helloworld target +------------------------------------ + +For this tutorial, we want to create a target `helloworld` that produces +a binary from the C++ source `main.cpp`. To define such a target, create +a `TARGETS` file with the following content: + +``` {.jsonc srcname="TARGETS"} +{ "helloworld": + { "type": ["@", "rules", "CC", "binary"] + , "name": ["helloworld"] + , "srcs": ["main.cpp"] + } +} +``` + +The `"type"` field refers to the rule `"binary"` from the module `"CC"` +of the `"rules"` repository. This rule additionally requires the string +field `"name"`, which specifies the name of the binary to produce; as +the generic interface of rules is to have fields either take a list of +strings or a list of targets, we have to specify the name as a list +(this rule will simply concatenate all strings given in this field). +Furthermore, at least one input to the binary is required, which can be +specified via the target fields `"srcs"` or `"deps"`. In our case, the +former is used, which contains our single source file (files are +considered targets). + +Now, the last file that is missing is the actual source file `main.cpp`: + +``` {.cpp srcname="main.cpp"} +#include <iostream> + +int main() { + std::cout << "Hello world!\n"; + return 0; +} +``` + +Building the helloworld target +------------------------------ + +To build the `helloworld` target, we need specify it on the `just-mr` +command line: + +``` sh +$ just-mr build helloworld +INFO: Requested target is [["@","tutorial","","helloworld"],{}] +INFO: Analysed target [["@","tutorial","",helloworld"],{}] +INFO: Export targets found: 0 cached, 0 uncached, 0 not eligible for caching +INFO: Discovered 2 actions, 1 trees, 0 blobs +INFO: Building [["@","helloworld","","helloworld"],{}]. +INFO: Processed 2 actions, 0 cache hits. +INFO: Artifacts built, logical paths are: + helloworld [b5cfca8b810adc4686f5cac00258a137c5d4a3ba:17088:x] +$ +``` + +Note that the target is taken from the `tutorial` repository, as it +specified as the main repository in `repos.json`. If targets from other +repositories should be build, the repository to use must be specified +via the `--main` option. + +`just-mr` reads the repository configuration, fetches externals (if +any), generates the actual build configuration, and stores it in its +cache directory (by default under `$HOME/.cache/just`). Afterwards, the +generated configuration is used to call the `just` binary, which +performs the actual build. + +Note that these two programs, `just-mr` and `just`, can also be run +individually. To do so, first run `just-mr` with `setup` and capture the +path to the generated build configuration from stdout by assigning it to +a shell variable (e.g., `CONF`). Afterwards, `just` can be called to +perform the actual build by explicitly specifying the configuration file +via `-C`: + +``` sh +$ CONF=$(just-mr setup tutorial) +$ just build -C $CONF helloworld +``` + +Note that `just-mr` only needs to be run the very first time and only +once again whenever the `repos.json` file is modified. + +By default, the BSD-default compiler front-ends (which are also defined +for most Linux distributions) `cc` and `c++` are used for C and C++ +(variables `"CC"` and `"CXX"`). If you want to temporarily use different +defaults, you can use `-D` to provide a JSON object that sets different +default variables. For instance, to use Clang as C++ compiler for a +single build invocation, you can use the following command to provide an +object that sets `"CXX"` to `"clang++"`: + +``` sh +$ just-mr build helloworld -D'{"CXX":"clang++"}' +INFO: Requested target is [["@","tutorial","","helloworld"],{"CXX":"clang++"}] +INFO: Analysed target [["@","tutorial","","helloworld"],{"CXX":"clang++"}] +INFO: Export targets found: 0 cached, 0 uncached, 0 not eligible for caching +INFO: Discovered 2 actions, 1 trees, 0 blobs +INFO: Building [["@","tutorial","","helloworld"],{"CXX":"clang++"}]. +INFO: Processed 2 actions, 0 cache hits. +INFO: Artifacts built, logical paths are: + helloworld [b8cf7b8579d9dc7172b61660139e2c14521cedae:16944:x] +$ +``` + +Defining project defaults +------------------------- + +To define a custom set of defaults (toolchain and compile flags) for +your project, you need to create a separate file root for providing +required `TARGETS` file, which contains the `"defaults"` target that +should be used by the rules. This file root is then used as the *target +root* for the rules, i.e., the search path for `TARGETS` files. In this +way, the description of the `"defaults"` target is provided in a +separate file root, to keep the rules repository independent of these +definitions. + +We will call the new file root `tutorial-defaults` and need to create a +module directory `CC` in it: + +``` sh +$ mkdir -p ./tutorial-defaults/CC +``` + +In that module, we need to create the file +`tutorial-defaults/CC/TARGETS` that contains the target `"defaults"` and +specifies which toolchain and compile flags to use; it has to specify +the complete toolchain, but can specify a `"base"` toolchain to inherit +from. In our case, we don't use any base, but specify all the required +fields directly. + +``` {.jsonc srcname="tutorial-defaults/CC/TARGETS"} +{ "defaults": + { "type": ["CC", "defaults"] + , "CC": ["cc"] + , "CXX": ["c++"] + , "CFLAGS": ["-O2", "-Wall"] + , "CXXFLAGS": ["-O2", "-Wall"] + , "AR": ["ar"] + , "PATH": ["/bin", "/usr/bin"] + } +} +``` + +To use the project defaults, modify the existing `repos.json` to reflect +the following content: + +``` {.jsonc srcname="repos.json"} +{ "main": "tutorial" +, "repositories": + { "rules-cc": + { "repository": + { "type": "git" + , "branch": "master" + , "commit": "123d8b03bf2440052626151c14c54abce2726e6f" + , "repository": "https://github.com/just-buildsystem/rules-cc.git" + , "subdir": "rules" + } + , "target_root": "tutorial-defaults" + , "rule_root": "rules-cc" + } + , "tutorial": + { "repository": {"type": "file", "path": "."} + , "bindings": {"rules": "rules-cc"} + } + , "tutorial-defaults": + { "repository": {"type": "file", "path": "./tutorial-defaults"} + } + } +} +``` + +Note that the `"defaults"` target uses the rule `["CC", "defaults"]` +without specifying any external repository (e.g., +`["@", "rules", ...]`). This is because `"tutorial-defaults"` is not a +full-fledged repository but merely a file root that is considered local +to the `"rules-cc"` repository. In fact, the `"rules-cc"` repository +cannot refer to any external repository as it does not have any defined +bindings. + +To rebuild the project, we need to rerun `just-mr` (note that due to +configuration changes, rerunning only `just` would not suffice): + +``` sh +$ just-mr build helloworld +INFO: Requested target is [["@","tutorial","","helloworld"],{}] +INFO: Analysed target [["@","tutorial","","helloworld"],{}] +INFO: Export targets found: 0 cached, 0 uncached, 0 not eligible for caching +INFO: Discovered 2 actions, 1 trees, 0 blobs +INFO: Building [["@","tutorial","","helloworld"],{}]. +INFO: Processed 2 actions, 0 cache hits. +INFO: Artifacts built, logical paths are: + helloworld [487dc9e47b978877ed2f7d80b3395ce84b23be92:16992:x] +$ +``` + +Note that the output binary may have changed due to different defaults. + +Modeling target dependencies +---------------------------- + +For demonstration purposes, we will separate the print statements into a +static library `greet`, which will become a dependency to our binary. +Therefore, we create a new subdirectory `greet` with the files +`greet/greet.hpp`: + +``` {.cpp srcname="greet/greet.hpp"} +#include <string> + +void greet(std::string const& s); +``` + +and `greet/greet.cpp`: + +``` {.cpp srcname="greet/greet.cpp"} +#include "greet.hpp" +#include <iostream> + +void greet(std::string const& s) { + std::cout << "Hello " << s << "!\n"; +} +``` + +These files can now be used to create a static library `libgreet.a`. To +do so, we need to create the following target description in +`greet/TARGETS`: + +``` {.jsonc srcname="greet/TARGETS"} +{ "greet": + { "type": ["@", "rules", "CC", "library"] + , "name": ["greet"] + , "hdrs": ["greet.hpp"] + , "srcs": ["greet.cpp"] + , "stage": ["greet"] + } +} +``` + +Similar to `"binary"`, we have to provide a name and source file. +Additionally, a library has public headers defined via `"hdrs"` and an +optional staging directory `"stage"` (default value `"."`). The staging +directory specifies where the consumer of this library can expect to +find the library's artifacts. Note that this does not need to reflect +the location on the file system (i.e., a full-qualified path like +`["com", "example", "utils", "greet"]` could be used to distinguish it +from greeting libraries of other projects). The staging directory does +not only affect the main artifact `libgreet.a` but also it's +*runfiles*, a second set of artifacts, usually those a consumer needs to +make proper use the actual artifact; in the case of a library, the +runfiles are its public headers. Hence, the public header will be staged +to `"greet/greet.hpp"`. With that knowledge, we can now perform the +necessary modifications to `main.cpp`: + +``` {.cpp srcname="main.cpp"} +#include "greet/greet.hpp" + +int main() { + greet("Universe"); + return 0; +} +``` + +The target `"helloworld"` will have a direct dependency to the target +`"greet"` of the module `"greet"` in the top-level `TARGETS` file: + +``` {.jsonc srcname="TARGETS"} +{ "helloworld": + { "type": ["@", "rules", "CC", "binary"] + , "name": ["helloworld"] + , "srcs": ["main.cpp"] + , "private-deps": [["greet", "greet"]] + } +} +``` + +Note that there is no need to explicitly specify `"greet"`'s public +headers here as the appropriate artifacts of dependencies are +automatically added to the inputs of compile and link actions. The new +binary can be built with the same command as before (no need to rerun +`just-mr`): + +``` sh +$ just-mr build helloworld +INFO: Requested target is [["@","tutorial","","helloworld"],{}] +INFO: Analysed target [["@","tutorial","","helloworld"],{}] +INFO: Export targets found: 0 cached, 0 uncached, 0 not eligible for caching +INFO: Discovered 4 actions, 2 trees, 0 blobs +INFO: Building [["@","tutorial","","helloworld"],{}]. +INFO: Processed 4 actions, 0 cache hits. +INFO: Artifacts built, logical paths are: + helloworld [2b81e3177afc382452a2df9f294d3df90a9ccaf0:17664:x] +$ +``` + +To only build the static library target `"greet"` from module `"greet"`, +run the following command: + +``` sh +$ just-mr build greet greet +INFO: Requested target is [["@","tutorial","greet","greet"],{}] +INFO: Analysed target [["@","tutorial","greet","greet"],{}] +INFO: Export targets found: 0 cached, 0 uncached, 0 not eligible for caching +INFO: Discovered 2 actions, 1 trees, 0 blobs +INFO: Building [["@","tutorial","greet","greet"],{}]. +INFO: Processed 2 actions, 2 cache hits. +INFO: Artifacts built, logical paths are: + greet/libgreet.a [83ed406e21f285337b0c9bd5011f56f656bba683:2992:f] + (1 runfiles omitted.) +$ +``` diff --git a/doc/tutorial/hello-world.org b/doc/tutorial/hello-world.org deleted file mode 100644 index 342eaf82..00000000 --- a/doc/tutorial/hello-world.org +++ /dev/null @@ -1,370 +0,0 @@ -* Building C++ Hello World - -/justbuild/ is a true language-agnostic (there are no more-equal languages) and -multi-repository build system. As a consequence, high-level concepts (e.g., C++ -binaries, C++ libraries, etc.) are not hardcoded built-ins of the tool, but -rather provided via a set of rules. These rules can be specified as a true -dependency to your project like any other external repository your project might -depend on. - -** Setting up the Multi-Repository Configuration - -To build a project with multi-repository dependencies, we first need to provide -a configuration that declares the required repositories. Before we begin, we -need to declare where the root of our workspace is located by creating an empty -file ~ROOT~: - -#+BEGIN_SRC sh -$ touch ROOT -#+END_SRC - -Second, we also need to create the multi-repository configuration ~repos.json~ -in the workspace root: - -#+SRCNAME: repos.json -#+BEGIN_SRC js -{ "main": "tutorial" -, "repositories": - { "rules-cc": - { "repository": - { "type": "git" - , "branch": "master" - , "commit": "123d8b03bf2440052626151c14c54abce2726e6f" - , "repository": "https://github.com/just-buildsystem/rules-cc.git" - , "subdir": "rules" - } - } - , "tutorial": - { "repository": {"type": "file", "path": "."} - , "bindings": {"rules": "rules-cc"} - } - } -} -#+END_SRC - -In that configuration, two repositories are defined: - - 1. The ~"rules-cc"~ repository located in the subdirectory ~rules~ of - [[https://github.com/just-buildsystem/rules-cc/tree/123d8b03bf2440052626151c14c54abce2726e6f][just-buildsystem/rules-cc:123d8b03bf2440052626151c14c54abce2726e6f]], - which contains the high-level concepts for building C/C++ binaries and - libraries. - - 2. The ~"tutorial"~ repository located at ~.~, which contains the targets that - we want to build. It has a single dependency, which is the /rules/ that are - needed to build the target. These rules are bound via the open name - ~"rules"~ to the just created repository ~"rules-cc"~. In this way, the - entities provided by ~"rules-cc"~ can be accessed from within the - ~"tutorial"~ repository via the fully-qualified name - ~["@", "rules", "<module>", "<name>"]~; fully-qualified - names (for rules, targets to build (like libraries, binaries), - etc) are given by a repository name, a path specifying a - directory within that repository (the "module") where the - specification file is located, and a symbolic name (i.e., an - arbitrary string that is used as key in the specification). - -The final repository configuration contains a single ~JSON~ object with the key -~"repositories"~ referring to an object of repository names as keys and -repository descriptions as values. For convenience, the main repository to pick -is set to ~"tutorial"~. - -** Description of the helloworld target - -For this tutorial, we want to create a target ~helloworld~ that produces a -binary from the C++ source ~main.cpp~. To define such a target, create a -~TARGETS~ file with the following content: - -#+SRCNAME: TARGETS -#+BEGIN_SRC js -{ "helloworld": - { "type": ["@", "rules", "CC", "binary"] - , "name": ["helloworld"] - , "srcs": ["main.cpp"] - } -} -#+END_SRC - -The ~"type"~ field refers to the rule ~"binary"~ from the module ~"CC"~ of the -~"rules"~ repository. This rule additionally requires the string field ~"name"~, -which specifies the name of the binary to produce; as the generic interface of -rules is to have fields either take a list of strings or a list of targets, -we have to specify the name as a list (this rule will simply concatenate all -strings given in this field). Furthermore, at least one -input to the binary is required, which can be specified via the target fields -~"srcs"~ or ~"deps"~. In our case, the former is used, which contains our single -source file (files are considered targets). - -Now, the last file that is missing is the actual source file ~main.cpp~: - -#+SRCNAME: main.cpp -#+BEGIN_SRC cpp -#include <iostream> - -int main() { - std::cout << "Hello world!\n"; - return 0; -} -#+END_SRC - -** Building the helloworld target - -To build the ~helloworld~ target, we need specify it on the ~just-mr~ command -line: - -#+BEGIN_SRC sh -$ just-mr build helloworld -INFO: Requested target is [["@","tutorial","","helloworld"],{}] -INFO: Analysed target [["@","tutorial","",helloworld"],{}] -INFO: Export targets found: 0 cached, 0 uncached, 0 not eligible for caching -INFO: Discovered 2 actions, 1 trees, 0 blobs -INFO: Building [["@","helloworld","","helloworld"],{}]. -INFO: Processed 2 actions, 0 cache hits. -INFO: Artifacts built, logical paths are: - helloworld [b5cfca8b810adc4686f5cac00258a137c5d4a3ba:17088:x] -$ -#+END_SRC - -Note that the target is taken from the ~tutorial~ repository, as it specified as -the main repository in ~repos.json~. If targets from other repositories should -be build, the repository to use must be specified via the ~--main~ option. - -~just-mr~ reads the repository configuration, fetches externals (if any), -generates the actual build configuration, and stores it in its cache directory -(by default under ~$HOME/.cache/just~). Afterwards, the generated configuration -is used to call the ~just~ binary, which performs the actual build. - -Note that these two programs, ~just-mr~ and ~just~, can also be run -individually. To do so, first run ~just-mr~ with ~setup~ and capture the path to -the generated build configuration from stdout by assigning it to a shell -variable (e.g., ~CONF~). Afterwards, ~just~ can be called to perform the actual -build by explicitly specifying the configuration file via ~-C~: - -#+BEGIN_SRC sh -$ CONF=$(just-mr setup tutorial) -$ just build -C $CONF helloworld -#+END_SRC - -Note that ~just-mr~ only needs to be run the very first time and only once again -whenever the ~repos.json~ file is modified. - -By default, the BSD-default compiler front-ends (which are also defined for most -Linux distributions) ~cc~ and ~c++~ are used for C and C++ (variables ~"CC"~ and -~"CXX"~). If you want to temporarily use different defaults, you can use ~-D~ to -provide a JSON object that sets different default variables. For instance, to -use Clang as C++ compiler for a single build invocation, you can use the -following command to provide an object that sets ~"CXX"~ to ~"clang++"~: - -#+BEGIN_SRC sh -$ just-mr build helloworld -D'{"CXX":"clang++"}' -INFO: Requested target is [["@","tutorial","","helloworld"],{"CXX":"clang++"}] -INFO: Analysed target [["@","tutorial","","helloworld"],{"CXX":"clang++"}] -INFO: Export targets found: 0 cached, 0 uncached, 0 not eligible for caching -INFO: Discovered 2 actions, 1 trees, 0 blobs -INFO: Building [["@","tutorial","","helloworld"],{"CXX":"clang++"}]. -INFO: Processed 2 actions, 0 cache hits. -INFO: Artifacts built, logical paths are: - helloworld [b8cf7b8579d9dc7172b61660139e2c14521cedae:16944:x] -$ -#+END_SRC - -** Defining project defaults - -To define a custom set of defaults (toolchain and compile flags) for your -project, you need to create a separate file root for providing required -~TARGETS~ file, which contains the ~"defaults"~ target that should be used by -the rules. This file root is then used as the /target root/ for the rules, i.e., -the search path for ~TARGETS~ files. In this way, the description of the -~"defaults"~ target is provided in a separate file root, to keep the rules -repository independent of these definitions. - -We will call the new file root ~tutorial-defaults~ and need to create a module -directory ~CC~ in it: - -#+BEGIN_SRC sh -$ mkdir -p ./tutorial-defaults/CC -#+END_SRC - -In that module, we need to create the file ~tutorial-defaults/CC/TARGETS~ that -contains the target ~"defaults"~ and specifies which toolchain and compile flags -to use; it has to specify the complete toolchain, but can specify a ~"base"~ -toolchain to inherit from. In our case, we don't use any base, but specify all -the required fields directly. - -#+SRCNAME: tutorial-defaults/CC/TARGETS -#+BEGIN_SRC js -{ "defaults": - { "type": ["CC", "defaults"] - , "CC": ["cc"] - , "CXX": ["c++"] - , "CFLAGS": ["-O2", "-Wall"] - , "CXXFLAGS": ["-O2", "-Wall"] - , "AR": ["ar"] - , "PATH": ["/bin", "/usr/bin"] - } -} -#+END_SRC - -To use the project defaults, modify the existing ~repos.json~ to reflect the -following content: - -#+SRCNAME: repos.json -#+BEGIN_SRC js -{ "main": "tutorial" -, "repositories": - { "rules-cc": - { "repository": - { "type": "git" - , "branch": "master" - , "commit": "123d8b03bf2440052626151c14c54abce2726e6f" - , "repository": "https://github.com/just-buildsystem/rules-cc.git" - , "subdir": "rules" - } - , "target_root": "tutorial-defaults" - , "rule_root": "rules-cc" - } - , "tutorial": - { "repository": {"type": "file", "path": "."} - , "bindings": {"rules": "rules-cc"} - } - , "tutorial-defaults": - { "repository": {"type": "file", "path": "./tutorial-defaults"} - } - } -} -#+END_SRC - -Note that the ~"defaults"~ target uses the rule ~["CC", "defaults"]~ without -specifying any external repository (e.g., ~["@", "rules", ...]~). This is -because ~"tutorial-defaults"~ is not a full-fledged repository but merely a file -root that is considered local to the ~"rules-cc"~ repository. In fact, the -~"rules-cc"~ repository cannot refer to any external repository as it does not -have any defined bindings. - -To rebuild the project, we need to rerun ~just-mr~ (note that due to -configuration changes, rerunning only ~just~ would not suffice): - -#+BEGIN_SRC sh -$ just-mr build helloworld -INFO: Requested target is [["@","tutorial","","helloworld"],{}] -INFO: Analysed target [["@","tutorial","","helloworld"],{}] -INFO: Export targets found: 0 cached, 0 uncached, 0 not eligible for caching -INFO: Discovered 2 actions, 1 trees, 0 blobs -INFO: Building [["@","tutorial","","helloworld"],{}]. -INFO: Processed 2 actions, 0 cache hits. -INFO: Artifacts built, logical paths are: - helloworld [487dc9e47b978877ed2f7d80b3395ce84b23be92:16992:x] -$ -#+END_SRC - -Note that the output binary may have changed due to different defaults. - -** Modeling target dependencies - -For demonstration purposes, we will separate the print statements into a static -library ~greet~, which will become a dependency to our binary. Therefore, we -create a new subdirectory ~greet~ with the files ~greet/greet.hpp~: - -#+SRCNAME: greet/greet.hpp -#+BEGIN_SRC cpp -#include <string> - -void greet(std::string const& s); -#+END_SRC - -and ~greet/greet.cpp~: - -#+SRCNAME: greet/greet.cpp -#+BEGIN_SRC cpp -#include "greet.hpp" -#include <iostream> - -void greet(std::string const& s) { - std::cout << "Hello " << s << "!\n"; -} -#+END_SRC - -These files can now be used to create a static library ~libgreet.a~. To do so, -we need to create the following target description in ~greet/TARGETS~: - -#+SRCNAME: greet/TARGETS -#+BEGIN_SRC js -{ "greet": - { "type": ["@", "rules", "CC", "library"] - , "name": ["greet"] - , "hdrs": ["greet.hpp"] - , "srcs": ["greet.cpp"] - , "stage": ["greet"] - } -} -#+END_SRC - -Similar to ~"binary"~, we have to provide a name and source file. Additionally, -a library has public headers defined via ~"hdrs"~ and an optional staging -directory ~"stage"~ (default value ~"."~). The staging directory specifies where -the consumer of this library can expect to find the library's artifacts. Note -that this does not need to reflect the location on the file system (i.e., a -full-qualified path like ~["com", "example", "utils", "greet"]~ could be used to -distinguish it from greeting libraries of other projects). The staging directory -does not only affect the main artifact ~libgreet.a~ but also it's /runfiles/, -a second set of artifacts, usually those a consumer needs to make proper use the -actual artifact; in the case of a library, the runfiles are its public headers. -Hence, the public header will be staged to ~"greet/greet.hpp"~. With that -knowledge, we can now perform the necessary modifications to ~main.cpp~: - -#+SRCNAME: main.cpp -#+BEGIN_SRC cpp -#include "greet/greet.hpp" - -int main() { - greet("Universe"); - return 0; -} -#+END_SRC - -The target ~"helloworld"~ will have a direct dependency to the target ~"greet"~ -of the module ~"greet"~ in the top-level ~TARGETS~ file: - -#+SRCNAME: TARGETS -#+BEGIN_SRC js -{ "helloworld": - { "type": ["@", "rules", "CC", "binary"] - , "name": ["helloworld"] - , "srcs": ["main.cpp"] - , "private-deps": [["greet", "greet"]] - } -} -#+END_SRC - -Note that there is no need to explicitly specify ~"greet"~'s public headers here -as the appropriate artifacts of dependencies are automatically added to the -inputs of compile and link actions. The new binary can be built with the same -command as before (no need to rerun ~just-mr~): - -#+BEGIN_SRC sh -$ just-mr build helloworld -INFO: Requested target is [["@","tutorial","","helloworld"],{}] -INFO: Analysed target [["@","tutorial","","helloworld"],{}] -INFO: Export targets found: 0 cached, 0 uncached, 0 not eligible for caching -INFO: Discovered 4 actions, 2 trees, 0 blobs -INFO: Building [["@","tutorial","","helloworld"],{}]. -INFO: Processed 4 actions, 0 cache hits. -INFO: Artifacts built, logical paths are: - helloworld [2b81e3177afc382452a2df9f294d3df90a9ccaf0:17664:x] -$ -#+END_SRC - -To only build the static library target ~"greet"~ from module ~"greet"~, run the -following command: - -#+BEGIN_SRC sh -$ just-mr build greet greet -INFO: Requested target is [["@","tutorial","greet","greet"],{}] -INFO: Analysed target [["@","tutorial","greet","greet"],{}] -INFO: Export targets found: 0 cached, 0 uncached, 0 not eligible for caching -INFO: Discovered 2 actions, 1 trees, 0 blobs -INFO: Building [["@","tutorial","greet","greet"],{}]. -INFO: Processed 2 actions, 2 cache hits. -INFO: Artifacts built, logical paths are: - greet/libgreet.a [83ed406e21f285337b0c9bd5011f56f656bba683:2992:f] - (1 runfiles omitted.) -$ -#+END_SRC diff --git a/doc/tutorial/proto.org b/doc/tutorial/proto.md index b4a02d48..8a04e373 100644 --- a/doc/tutorial/proto.org +++ b/doc/tutorial/proto.md @@ -1,27 +1,28 @@ -* Using protocol buffers +Using protocol buffers +====================== -The rules /justbuild/ uses for itself also support protocol -buffers. This tutorial shows how to use those rules and the targets -associated with them. It is not a tutorial on protocol buffers -itself; rather, it is assumed that the reader has some knowledge on -[[https://developers.google.com/protocol-buffers/][protocol buffers]]. +The rules *justbuild* uses for itself also support protocol buffers. +This tutorial shows how to use those rules and the targets associated +with them. It is not a tutorial on protocol buffers itself; rather, it +is assumed that the reader has some knowledge on [protocol +buffers](https://developers.google.com/protocol-buffers/). -** Setting up the repository configuration +Setting up the repository configuration +--------------------------------------- -Before we begin, we first need to declare where the root of our workspace is -located by creating the empty file ~ROOT~: +Before we begin, we first need to declare where the root of our +workspace is located by creating the empty file `ROOT`: -#+BEGIN_SRC sh +``` sh $ touch ROOT -#+END_SRC +``` -The ~protobuf~ repository conveniently contains an -[[https://github.com/protocolbuffers/protobuf/tree/v3.12.4/examples][example]], -so we can use this and just add our own target files. We create -file ~repos.template.json~ as follows. +The `protobuf` repository conveniently contains an +[example](https://github.com/protocolbuffers/protobuf/tree/v3.12.4/examples), +so we can use this and just add our own target files. We create file +`repos.template.json` as follows. -#+SRCNAME: repos.template.json -#+BEGIN_SRC js +``` {.jsonc srcname="repos.template.json"} { "repositories": { "": { "repository": @@ -36,45 +37,45 @@ file ~repos.template.json~ as follows. , "tutorial": {"repository": {"type": "file", "path": "."}} } } -#+END_SRC +``` -The missing entry ~"rules-cc"~ refers to our C/C++ build rules provided -[[https://github.com/just-buildsystem/rules-cc][online]]. These rules support -protobuf if the dependency ~"protoc"~ is provided. To import this rule -repository including the required transitive dependencies for protobuf, the -~bin/just-import-git~ script with option ~--as rules-cc~ can be used to -generate the actual ~repos.json~: +The missing entry `"rules-cc"` refers to our C/C++ build rules provided +[online](https://github.com/just-buildsystem/rules-cc). These rules +support protobuf if the dependency `"protoc"` is provided. To import +this rule repository including the required transitive dependencies for +protobuf, the `bin/just-import-git` script with option `--as rules-cc` +can be used to generate the actual `repos.json`: -#+BEGIN_SRC sh +``` sh $ just-import-git -C repos.template.json -b master --as rules-cc https://github.com/just-buildsystem/rules-cc > repos.json -#+END_SRC +``` -To build the example with ~just~, the only task is to write targets files. As -that contains a couple of new concepts, we will do this step by step. +To build the example with `just`, the only task is to write targets +files. As that contains a couple of new concepts, we will do this step +by step. -** The proto library +The proto library +----------------- First, we have to declare the proto library. In this case, it only -contains the file ~addressbook.proto~ and has no dependencies. To -declare the library, create a ~TARGETS~ file with the following -content: +contains the file `addressbook.proto` and has no dependencies. To +declare the library, create a `TARGETS` file with the following content: -#+SRCNAME: TARGETS -#+BEGIN_SRC js +``` {.jsonc srcname="TARGETS"} { "address": { "type": ["@", "rules", "proto", "library"] , "name": ["addressbook"] , "srcs": ["addressbook.proto"] } } -#+END_SRC +``` In general, proto libraries could also depend on other proto libraries; -those would be added to the ~"deps"~ field. +those would be added to the `"deps"` field. When building the library, there's very little to do. -#+BEGIN_SRC sh +``` sh $ just-mr build address INFO: Requested target is [["@","","","address"],{}] INFO: Analysed target [["@","","","address"],{}] @@ -84,20 +85,21 @@ INFO: Building [["@","","","address"],{}]. INFO: Processed 0 actions, 0 cache hits. INFO: Artifacts built, logical paths are: $ -#+END_SRC +``` On the other hand, what did we expect? A proto library is an abstract description of a protocol, so, as long as we don't specify for which language we want to have bindings, there is nothing to generate. -Nevertheless, a proto library target is not empty. In fact, it can't be empty, -as other targets can only access the values of a target and have no -insights into its definitions. We already relied on this design principle -implicitly, when we exploited target-level caching for our external dependencies -and did not even construct the dependency graph for that target. A proto -library simply provides the dependency structure of the ~.proto~ files. +Nevertheless, a proto library target is not empty. In fact, it can't be +empty, as other targets can only access the values of a target and have +no insights into its definitions. We already relied on this design +principle implicitly, when we exploited target-level caching for our +external dependencies and did not even construct the dependency graph +for that target. A proto library simply provides the dependency +structure of the `.proto` files. -#+BEGIN_SRC sh +``` sh $ just-mr analyse --dump-nodes - address INFO: Requested target is [["@","","","address"],{}] INFO: Result of target [["@","","","address"],{}]: { @@ -146,36 +148,35 @@ INFO: Target nodes of target [["@","","","address"],{}]: } } $ -#+END_SRC +``` -The target has one provider ~"proto"~, which is a node. Nodes are -an abstract representation of a target graph. More precisely, there -are two kind of nodes, and our example contains one of each. +The target has one provider `"proto"`, which is a node. Nodes are an +abstract representation of a target graph. More precisely, there are two +kind of nodes, and our example contains one of each. -The simple kind of nodes are the value nodes; they represent a -target that has a fixed value, and hence are given by artifacts, -runfiles, and provided data. In our case, we have one value node, -the one for the ~.proto~ file. +The simple kind of nodes are the value nodes; they represent a target +that has a fixed value, and hence are given by artifacts, runfiles, and +provided data. In our case, we have one value node, the one for the +`.proto` file. The other kind of nodes are the abstract nodes. They describe the -arguments for a target, but only have an abstract name (i.e., a -string) for the rule. Combining such an abstract target with a -binding for the abstract rule names gives a concrete "anonymous" -target that, in our case, will generate the library with the bindings -for the concrete language. In this example, the abstract name is -~"library"~. The alternative in our proto rules would have been -~"service library"~, for proto libraries that also contain ~rpc~ -definitions (which is used by [[https://grpc.io/][gRPC]]). - -** Using proto libraries - -Using proto libraries requires, as discussed, bindings for the -abstract names. Fortunately, our ~CC~ rules are aware of proto -libraries, so we can simply use them. Our target file hence -continues as follows. - -#+SRCNAME: TARGETS -#+BEGIN_SRC js +arguments for a target, but only have an abstract name (i.e., a string) +for the rule. Combining such an abstract target with a binding for the +abstract rule names gives a concrete "anonymous" target that, in our +case, will generate the library with the bindings for the concrete +language. In this example, the abstract name is `"library"`. The +alternative in our proto rules would have been `"service library"`, for +proto libraries that also contain `rpc` definitions (which is used by +[gRPC](https://grpc.io/)). + +Using proto libraries +--------------------- + +Using proto libraries requires, as discussed, bindings for the abstract +names. Fortunately, our `CC` rules are aware of proto libraries, so we +can simply use them. Our target file hence continues as follows. + +``` {.jsonc srcname="TARGETS"} ... , "add_person": { "type": ["@", "rules", "CC", "binary"] @@ -190,14 +191,14 @@ continues as follows. , "private-proto": ["address"] } ... -#+END_SRC +``` -The first time, we build a target that requires the proto compiler -(in that particular version, built in that particular way), it takes -a bit of time, as the proto compiler has to be built. But in follow-up -builds, also in different projects, the target-level cache is filled already. +The first time, we build a target that requires the proto compiler (in +that particular version, built in that particular way), it takes a bit +of time, as the proto compiler has to be built. But in follow-up builds, +also in different projects, the target-level cache is filled already. -#+BEGIN_SRC sh +``` sh $ just-mr build add_person ... $ just-mr build add_person @@ -210,12 +211,12 @@ INFO: Processed 5 actions, 5 cache hits. INFO: Artifacts built, logical paths are: add_person [bcbb3deabfe0d77e6d3ea35615336a2f59a1b0aa:2285928:x] $ -#+END_SRC +``` If we look at the actions associated with the binary, we find that those are still the two actions we expect: a compile action and a link action. -#+BEGIN_SRC sh +``` sh $ just-mr analyse add_person --dump-actions - INFO: Requested target is [["@","","","add_person"],{}] INFO: Result of target [["@","","","add_person"],{}]: { @@ -251,17 +252,17 @@ INFO: Actions for target [["@","","","add_person"],{}]: } ] $ -#+END_SRC +``` -As discussed, the ~libaddressbook.a~ that is conveniently available -during the linking of the binary (as well as the ~addressbook.pb.h~ -available in the ~include~ tree for the compile action) are generated -by an anonymous target. Using that during the build we already -filled the target-level cache, we can have a look at all targets -still analysed. In the one anonymous target, we find again the -abstract node we discussed earlier. +As discussed, the `libaddressbook.a` that is conveniently available +during the linking of the binary (as well as the `addressbook.pb.h` +available in the `include` tree for the compile action) are generated by +an anonymous target. Using that during the build we already filled the +target-level cache, we can have a look at all targets still analysed. In +the one anonymous target, we find again the abstract node we discussed +earlier. -#+BEGIN_SRC sh +``` sh $ just-mr analyse add_person --dump-targets - INFO: Requested target is [["@","","","add_person"],{}] INFO: Result of target [["@","","","add_person"],{}]: { @@ -302,25 +303,24 @@ INFO: List of analysed targets: } } $ -#+END_SRC - -It should be noted, however, that this tight integration of proto -into our ~C++~ rules is just convenience of our code base. If we had -to cooperate with rules not aware of proto, we could have created -a separate rule delegating the library creation to the anonymous -target and then simply reflecting the values of that target. -In fact, we could simply use an empty library with a public ~proto~ -dependency for this purpose. - -#+SRCNAME: TARGETS -#+BEGIN_SRC js +``` + +It should be noted, however, that this tight integration of proto into +our `C++` rules is just convenience of our code base. If we had to +cooperate with rules not aware of proto, we could have created a +separate rule delegating the library creation to the anonymous target +and then simply reflecting the values of that target. In fact, we could +simply use an empty library with a public `proto` dependency for this +purpose. + +``` {.jsonc srcname="TARGETS"} ... , "address proto library": {"type": ["@", "rules", "CC", "library"], "proto": ["address"]} ... -#+END_SRC +``` -#+BEGIN_SRC sh +``` sh $ just-mr analyse 'address proto library' ... INFO: Requested target is [["@","","","address proto library"],{}] @@ -347,18 +347,18 @@ INFO: Result of target [["@","","","address proto library"],{}]: { } } $ -#+END_SRC +``` -** Adding a test +Adding a test +------------- -Finally, let's add a test. As we use the ~protobuf~ repository as -workspace root, we add the test script ad hoc into a targets file, -using the ~"file_gen"~ rule. For debugging a potentially failing -test, we also keep the intermediate files the test generates. -Create a top-level ~TARGETS~ file with the following content: +Finally, let's add a test. As we use the `protobuf` repository as +workspace root, we add the test script ad hoc into a targets file, using +the `"file_gen"` rule. For debugging a potentially failing test, we also +keep the intermediate files the test generates. Create a top-level +`TARGETS` file with the following content: -#+SRCNAME: TARGETS -#+BEGIN_SRC js +``` {.jsonc srcname="TARGETS"} ... , "test.sh": { "type": "file_gen" @@ -382,17 +382,16 @@ Create a top-level ~TARGETS~ file with the following content: , "keep": ["addressbook.data", "out.txt"] } ... -#+END_SRC +``` -That example also shows why it is important that the generation -of the language bindings is delegated to an anonymous target: we -want to analyse only once how the ~C++~ bindings are generated. -Nevertheless, many targets can depend (directly or indirectly) on -the same proto library. And, indeed, analysing the test, we get -the expected additional targets and the one anonymous target is -reused by both binaries. +That example also shows why it is important that the generation of the +language bindings is delegated to an anonymous target: we want to +analyse only once how the `C++` bindings are generated. Nevertheless, +many targets can depend (directly or indirectly) on the same proto +library. And, indeed, analysing the test, we get the expected additional +targets and the one anonymous target is reused by both binaries. -#+BEGIN_SRC sh +``` sh $ just-mr analyse test --dump-targets - INFO: Requested target is [["@","","","test"],{}] INFO: Result of target [["@","","","test"],{}]: { @@ -444,11 +443,11 @@ INFO: List of analysed targets: } INFO: Target tainted ["test"]. $ -#+END_SRC +``` Finally, the test passes and the output is as expected. -#+BEGIN_SRC sh +``` sh $ just-mr build test -Pwork/out.txt INFO: Requested target is [["@","","","test"],{}] INFO: Analysed target [["@","","","test"],{}] @@ -472,4 +471,4 @@ Person ID: 12345 Updated: 2022-12-14T18:08:36Z INFO: Target tainted ["test"]. $ -#+END_SRC +``` diff --git a/doc/tutorial/rebuild.org b/doc/tutorial/rebuild.md index 80aafb6f..3f1ddd88 100644 --- a/doc/tutorial/rebuild.org +++ b/doc/tutorial/rebuild.md @@ -1,15 +1,17 @@ -* Ensuring reproducibility of the build - -Software builds should be [[https://reproducible-builds.org/][reproducible]]. -The ~just~ tool, supports this goal in local builds by isolating -individual actions, setting permissions and file time stamps to -canonical values, etc; most remote execution systems take even further -measures to ensure the environment always looks the same to every -action. Nevertheless, it is always possible to break reproducibility -by bad actions, both coming from rules not carefully written, as -well as from ad-hoc actions added by the ~generic~ target. - -#+BEGIN_SRC js +Ensuring reproducibility of the build +===================================== + +Software builds should be +[reproducible](https://reproducible-builds.org/). The `just` tool, +supports this goal in local builds by isolating individual actions, +setting permissions and file time stamps to canonical values, etc; most +remote execution systems take even further measures to ensure the +environment always looks the same to every action. Nevertheless, it is +always possible to break reproducibility by bad actions, both coming +from rules not carefully written, as well as from ad-hoc actions added +by the `generic` target. + +``` jsonc ... , "version.h": { "type": "generic" @@ -18,29 +20,29 @@ well as from ad-hoc actions added by the ~generic~ target. , "outs": ["version.h"] } ... -#+END_SRC - -Besides time stamps there are many other sources of nondeterminism, -like properties of the build machine (name, number of CPUs available, -etc), but also subtle ones like ~readdir~ order. Often, those -non-reproducible parts get buried deeply in a final artifact (like -the version string embedded in a binary contained in a compressed -installation archive); and, as long as the non-reproducible action -stays in cache, it does not even result in bad incrementality. -Still, others won't be able to reproduce the exact artifact. - -There are tools like [[https://diffoscope.org/][diffoscope]] to deeply +``` + +Besides time stamps there are many other sources of nondeterminism, like +properties of the build machine (name, number of CPUs available, etc), +but also subtle ones like `readdir` order. Often, those non-reproducible +parts get buried deeply in a final artifact (like the version string +embedded in a binary contained in a compressed installation archive); +and, as long as the non-reproducible action stays in cache, it does not +even result in bad incrementality. Still, others won't be able to +reproduce the exact artifact. + +There are tools like [diffoscope](https://diffoscope.org/) to deeply compare archives and other container formats. Nevertheless, it is desirable to find the root causes, i.e., the first (in topological order) actions that yield a different output. -** Rebuilding +Rebuilding +---------- -For the remainder of this section, we will consider the following example -project with the C++ source file ~hello.cpp~: +For the remainder of this section, we will consider the following +example project with the C++ source file `hello.cpp`: -#+SRCNAME: hello.cpp -#+BEGIN_SRC cpp +``` {.cpp srcname="hello.cpp"} #include <iostream> #include "version.h" @@ -50,12 +52,11 @@ int main(int argc, const char* argv[]) { } return 0; } -#+END_SRC +``` -and the following ~TARGETS~ file: +and the following `TARGETS` file: -#+SRCNAME: TARGETS -#+BEGIN_SRC js +``` {.jsonc srcname="TARGETS"} { "": { "type": "install" , "files": @@ -95,17 +96,17 @@ and the following ~TARGETS~ file: , "deps": ["out.txt"] } } -#+END_SRC +``` -To search for the root cause of non-reproducibility, ~just~ has -a subcommand ~rebuild~. It builds the specified target again, requesting +To search for the root cause of non-reproducibility, `just` has a +subcommand `rebuild`. It builds the specified target again, requesting that every action be executed again (but target-level cache is still active); then the result of every action is compared to the one in the action cache, if present with the same inputs. So, you typically would -first ~build~ and then ~rebuild~. Note that a repeated ~build~ simply +first `build` and then `rebuild`. Note that a repeated `build` simply takes the action result from cache. -#+BEGIN_SRC sh +``` sh $ just-mr build INFO: Requested target is [["@","tutorial","",""],{}] INFO: Analysed target [["@","tutorial","",""],{}] @@ -135,30 +136,31 @@ INFO: Export targets found: 0 cached, 0 uncached, 0 not eligible for caching INFO: Discovered 6 actions, 1 trees, 0 blobs INFO: Rebuilding [["@","tutorial","",""],{}]. WARN: Found flaky action: - - id: c854a382ea26628e1a5b8d4af00d6d0cef433436 - - cmd: ["sh","-c","echo '#define VERSION \"0.0.0.'`date +%Y%m%d%H%M%S`'\"' > version.h\n"] - - output 'version.h' differs: - - [6aac3477e22cd57e8c98ded78562d3c017e5d611:39:f] (rebuilt) - - [789a29f39b6aa966f91776bfe092e247614e6acd:39:f] (cached) + - id: c854a382ea26628e1a5b8d4af00d6d0cef433436 + - cmd: ["sh","-c","echo '#define VERSION \"0.0.0.'`date +%Y%m%d%H%M%S`'\"' > version.h\n"] + - output 'version.h' differs: + - [6aac3477e22cd57e8c98ded78562d3c017e5d611:39:f] (rebuilt) + - [789a29f39b6aa966f91776bfe092e247614e6acd:39:f] (cached) INFO: 2 actions compared with cache, 1 flaky actions found (0 of which tainted), no cache entry found for 4 actions. INFO: Artifacts built, logical paths are: bin/hello [73994ff43ec1161aba96708f277e8c88feab0386:16608:x] share/hello/OUT.txt [428b97b82b6c59cad7488b24e6b618ebbcd819bc:13:f] share/hello/version.txt [8dd65747395c0feab30891eab9e11d4a9dd0c715:39:f] $ -#+END_SRC +``` -In the example, the second action compared to cache is the upper -casing of the output. Even though the generation of ~out.txt~ depends -on the non-reproducible ~hello~, the file itself is reproducible. -Therefore, the follow-up actions are checked as well. +In the example, the second action compared to cache is the upper casing +of the output. Even though the generation of `out.txt` depends on the +non-reproducible `hello`, the file itself is reproducible. Therefore, +the follow-up actions are checked as well. -For this simple example, reading the console output is enough to understand -what's going on. However, checking for reproducibility usually is part -of a larger, quality-assurance process. To support the automation of such -processes, the findings can also be reported in machine-readable form. +For this simple example, reading the console output is enough to +understand what's going on. However, checking for reproducibility +usually is part of a larger, quality-assurance process. To support the +automation of such processes, the findings can also be reported in +machine-readable form. -#+BEGIN_SRC sh +``` sh $ just-mr rebuild --dump-flaky flakes.json --dump-graph actions.json [...] $ cat flakes.json @@ -186,40 +188,40 @@ $ cat flakes.json } } }$ -#+END_SRC +``` The file reports the flaky actions together with the non-reproducible artifacts they generated, reporting both, the cached and the newly -generated output. The files themselves can be obtained via ~just -install-cas~ as usual, allowing deeper comparison of the outputs. -The full definitions of the actions can be found in the action graph, -in the example dumped as well as ~actions.json~; this definition -also includes the origins for each action, i.e., the configured -targets that requested the respective action. +generated output. The files themselves can be obtained via `just +install-cas` as usual, allowing deeper comparison of the outputs. The +full definitions of the actions can be found in the action graph, in the +example dumped as well as `actions.json`; this definition also includes +the origins for each action, i.e., the configured targets that requested +the respective action. - -** Comparing build environments +Comparing build environments +---------------------------- Simply rebuilding on the same machine is good way to detect embedded time stamps of sufficiently small granularity; for other sources of -non-reproducibility, however, more modifications of the environment -are necessary. - -A simple, but effective, way for modifying the build environment -is the option ~-L~ to set the local launcher, a list of -strings the argument vector is prefixed with before the action is -executed. The default ~["env", "--"]~ simply resolves the program -to be executed in the current value of ~PATH~, but a different -value for the launcher can obviously be used to set environment -variables like ~LD_PRELOAD~. Relevant libraries and tools -include [[https://github.com/wolfcw/libfaketime][libfaketime]], -[[https://github.com/dtcooper/fakehostname][fakehostname]], -and [[https://salsa.debian.org/reproducible-builds/disorderfs][disorderfs]]. +non-reproducibility, however, more modifications of the environment are +necessary. + +A simple, but effective, way for modifying the build environment is the +option `-L` to set the local launcher, a list of strings the argument +vector is prefixed with before the action is executed. The default +`["env", "--"]` simply resolves the program to be executed in the +current value of `PATH`, but a different value for the launcher can +obviously be used to set environment variables like `LD_PRELOAD`. +Relevant libraries and tools include +[libfaketime](https://github.com/wolfcw/libfaketime), +[fakehostname](https://github.com/dtcooper/fakehostname), and +[disorderfs](https://salsa.debian.org/reproducible-builds/disorderfs). More variation can be achieved by comparing remote execution builds, -either for two different remote-execution end points or comparing -one remote-execution end point to the local build. The latter is -also a good way to find out where a build that "works on my machine" -differs. The endpoint on which the rebuild is executed can be set, -in the same way as for build with the ~-r~ option; the cache end -point to compare against can be set via the ~--vs~ option. +either for two different remote-execution end points or comparing one +remote-execution end point to the local build. The latter is also a good +way to find out where a build that "works on my machine" differs. The +endpoint on which the rebuild is executed can be set, in the same way as +for build with the `-r` option; the cache end point to compare against +can be set via the `--vs` option. diff --git a/doc/tutorial/target-file-glob-tree.org b/doc/tutorial/target-file-glob-tree.md index 58e9c725..524cf358 100644 --- a/doc/tutorial/target-file-glob-tree.org +++ b/doc/tutorial/target-file-glob-tree.md @@ -1,34 +1,35 @@ -* Target versus ~FILE~, ~GLOB~, and ~TREE~ +Target versus `FILE`, `GLOB`, and `TREE` +======================================== -So far, we referred to defined targets as well as source files -by their name and it just worked. When considering third-party -software we already saw the ~TREE~ reference. In this section, we -will highlight in more detail the ways to refer to sources, as well -as the difference between defined and source targets. The latter -is used, e.g., when third-party software has to be patched. +So far, we referred to defined targets as well as source files by their +name and it just worked. When considering third-party software we +already saw the `TREE` reference. In this section, we will highlight in +more detail the ways to refer to sources, as well as the difference +between defined and source targets. The latter is used, e.g., when +third-party software has to be patched. -As example for this section we use gnu ~units~ where we want to -patch into the standard units definition add two units of area -popular in German news. +As example for this section we use gnu `units` where we want to patch +into the standard units definition add two units of area popular in +German news. -** Repository Config for ~units~ with patches +Repository Config for `units` with patches +------------------------------------------ -Before we begin, we first need to declare where the root of our workspace is -located by creating the empty file ~ROOT~: +Before we begin, we first need to declare where the root of our +workspace is located by creating the empty file `ROOT`: -#+BEGIN_SRC sh +``` sh $ touch ROOT -#+END_SRC +``` The sources are an archive available on the web. As upstream uses a -different build system, we have to provide our own build description; -we take the top-level directory as layer for this. As we also want -to patch the definition file, we add the subdirectory ~files~ as -logical repository for the patches. Hence we create a file ~repos.json~ -with the following content. - -#+SRCNAME: repos.json -#+BEGIN_SRC js +different build system, we have to provide our own build description; we +take the top-level directory as layer for this. As we also want to patch +the definition file, we add the subdirectory `files` as logical +repository for the patches. Hence we create a file `repos.json` with the +following content. + +``` {.jsonc srcname="repos.json"} { "main": "units" , "repositories": { "rules-cc": @@ -55,31 +56,33 @@ with the following content. } } } -#+END_SRC +``` -The repository to set up is ~units~ and, as usual, we can use ~just-mr~ to -fetch the archive and obtain the resulting multi-repository configuration. +The repository to set up is `units` and, as usual, we can use `just-mr` +to fetch the archive and obtain the resulting multi-repository +configuration. -#+BEGIN_SRC sh +``` sh $ just-mr setup units -#+END_SRC +``` -** Patching a file: targets versus ~FILE~ +Patching a file: targets versus `FILE` +-------------------------------------- -Let's start by patching the source file ~definitions.units~. While, -conceptionally, we want to patch a third-party source file, we do /not/ +Let's start by patching the source file `definitions.units`. While, +conceptionally, we want to patch a third-party source file, we do *not* modify the sources. The workspace root is a git tree and stay like this. -Instead, we remember that we specify /targets/ and the definition of a +Instead, we remember that we specify *targets* and the definition of a target is looked up in the targets file; only if not defined there, it is implicitly considered a source target and taken from the target root. -So we will define a /target/ named ~definitions.units~ to replace the +So we will define a *target* named `definitions.units` to replace the original source file. -Let's first generate the patch. As we're already referring to source files -as targets, we have to provide a targets file already; we start with the -empty object and refine it later. +Let's first generate the patch. As we're already referring to source +files as targets, we have to provide a targets file already; we start +with the empty object and refine it later. -#+BEGIN_SRC sh +``` sh $ echo {} > TARGETS.units $ just-mr install -o . definitions.units INFO: Requested target is [["@","units","","definitions.units"],{}] @@ -100,41 +103,39 @@ $ mkdir files $ echo {} > files/TARGETS $ diff -u definitions.units.orig definitions.units > files/definitions.units.diff $ rm definitions.units* -#+END_SRC - -Our rules conveniently contain a rule ~["patch", "file"]~ to patch -a single file, and we already created the patch. The only other -input missing is the source file. So far, we could refer to it as -~"definitions.units"~ because there was no target of that name, but -now we're about to define a target with that very name. Fortunately, -in target files, we can use a special syntax to explicitly refer to -a source file of the current module, even if there is a target with -the same name: ~["FILE", null, "definition.units"]~. The syntax -requires the explicit ~null~ value for the current module, despite -the fact that explicit file references are only allowed for the -current module; in this way, the name is a list of length more than -two and cannot be confused with a top-level module called ~FILE~. -So we add this target and obtain as ~TARGETS.units~ the following. - -#+SRCNAME: TARGETS.units -#+BEGIN_SRC js +``` + +Our rules conveniently contain a rule `["patch", "file"]` to patch a +single file, and we already created the patch. The only other input +missing is the source file. So far, we could refer to it as +`"definitions.units"` because there was no target of that name, but now +we're about to define a target with that very name. Fortunately, in +target files, we can use a special syntax to explicitly refer to a +source file of the current module, even if there is a target with the +same name: `["FILE", null, "definition.units"]`. The syntax requires the +explicit `null` value for the current module, despite the fact that +explicit file references are only allowed for the current module; in +this way, the name is a list of length more than two and cannot be +confused with a top-level module called `FILE`. So we add this target +and obtain as `TARGETS.units` the following. + +``` {.jsonc srcname="TARGETS.units"} { "definitions.units": { "type": ["@", "rules", "patch", "file"] , "src": [["FILE", ".", "definitions.units"]] , "patch": [["@", "patches", "", "definitions.units.diff"]] } } -#+END_SRC +``` -Analysing ~"definitions.units"~ we find our defined target which -contains an action output. Still, it looks like a patched source -file; the new artifact is staged to the original location. Staging -is also used in the action definition, to avoid magic names (like -file names starting with ~-~), in-place operations (all actions -must not modify their inputs) and, in fact, have a -fixed command line. +Analysing `"definitions.units"` we find our defined target which +contains an action output. Still, it looks like a patched source file; +the new artifact is staged to the original location. Staging is also +used in the action definition, to avoid magic names (like file names +starting with `-`), in-place operations (all actions must not modify +their inputs) and, in fact, have a fixed command line. -#+BEGIN_SRC sh +``` sh $ just-mr analyse definitions.units --dump-actions - INFO: Requested target is [["@","units","","definitions.units"],{}] INFO: Result of target [["@","units","","definitions.units"],{}]: { @@ -172,11 +173,11 @@ INFO: Actions for target [["@","units","","definitions.units"],{}]: } ] $ -#+END_SRC +``` -Building ~"definitions.units"~ we find out patch applied correctly. +Building `"definitions.units"` we find out patch applied correctly. -#+BEGIN_SRC sh +``` sh $ just-mr build definitions.units -P definitions.units | grep -A 5 'German units' INFO: Requested target is [["@","units","","definitions.units"],{}] INFO: Analysed target [["@","units","","definitions.units"],{}] @@ -193,24 +194,24 @@ area_soccerfield 105 m * 68 m area_saarland 2570 km^2 zentner 50 kg $ -#+END_SRC +``` -** Globbing source files: ~"GLOB"~ +Globbing source files: `"GLOB"` +------------------------------- -Next, we collect all ~.units~ files. We could simply do this by enumerating -them in a target. +Next, we collect all `.units` files. We could simply do this by +enumerating them in a target. -#+SRCNAME: TARGETS.units -#+BEGIN_SRC js +``` {.jsonc srcname="TARGETS.units"} ... , "data-draft": { "type": "install", "deps": ["definitions.units", "currency.units"]} ... -#+END_SRC +``` -In this way, we get the desired collection of one unmodified source file and -the output of the patch action. +In this way, we get the desired collection of one unmodified source file +and the output of the patch action. -#+BEGIN_SRC sh +``` sh $ just-mr analyse data-draft INFO: Requested target is [["@","units","","data-draft"],{}] INFO: Result of target [["@","units","","data-draft"],{}]: { @@ -226,77 +227,76 @@ INFO: Result of target [["@","units","","data-draft"],{}]: { } } $ -#+END_SRC - -The disadvantage, however, that we might miss newly added ~.units~ -files if we update and upstream added new files. So we want all -source files that have the respective ending. The corresponding -source reference is ~"GLOB"~. A glob expands to the /collection/ -of all /sources/ that are /files/ in the /top-level/ directory of -the current module and that match the given pattern. It is important -to understand this in detail and the rational behind it. -- First of all, the artifact (and runfiles) map has an entry for - each file that matches. In particular, targets have the option to - define individual actions for each file, like ~["CC", "binary"]~ - does for the source files. This is different from ~"TREE"~ where - the artifact map contains a single artifact that happens to be a - directory. The tree behaviour is preferable when the internals - of the directory only matter for the execution of actions and not - for analysis; then there are less entries to carry around during - analysis and action-key computation, and the whole directory - is "reserved" for that tree avoid staging conflicts when latter - adding entries there. -- As a source reference, a glob expands to explicit source files; - targets having the same name as a source file are not taken into - account. In our example, ~["GLOB", null, "*.units"]~ therefore - contains the unpatched source file ~definitions.units~. In this - way, we avoid any surprises in the expansion of a glob when a new - source file is added with a name equal to an already existing target. -- Only files are considered for matching the glob. Directories - are ignored. -- Matches are only considered at the top-level directory. In this - way, only one directory has to be read during analysis; allowing - deeper globs would require traversal of subdirectories requiring - larger cost. While the explicit ~"TREE"~ reference allows recursive - traversal, in the typical use case of the respective workspace root - being a ~git~ root, it is actually cheap; we can look up the - ~git~ tree identifier without traversing the tree. Such a quick - look up would not be possible if matches had to be selected. - -So, ~["GLOB", null, "*.units"]~ expands to all the relevant source -files; but we still want to keep the patching. Most rules, like ~"install"~, -disallow staging conflicts to avoid accidentally ignoring a file due -to conflicting name. In our case, however, the dropping of the source -file in favour of the patched one is deliberate. For this, there is -the rule ~["data", "overlay"]~ taking the union of the artifacts of +``` + +The disadvantage, however, that we might miss newly added `.units` files +if we update and upstream added new files. So we want all source files +that have the respective ending. The corresponding source reference is +`"GLOB"`. A glob expands to the *collection* of all *sources* that are +*files* in the *top-level* directory of the current module and that +match the given pattern. It is important to understand this in detail +and the rational behind it. + + - First of all, the artifact (and runfiles) map has an entry for each + file that matches. In particular, targets have the option to define + individual actions for each file, like `["CC", "binary"]` does for + the source files. This is different from `"TREE"` where the artifact + map contains a single artifact that happens to be a directory. The + tree behaviour is preferable when the internals of the directory + only matter for the execution of actions and not for analysis; then + there are less entries to carry around during analysis and + action-key computation, and the whole directory is "reserved" for + that tree avoid staging conflicts when latter adding entries there. + - As a source reference, a glob expands to explicit source files; + targets having the same name as a source file are not taken into + account. In our example, `["GLOB", null, "*.units"]` therefore + contains the unpatched source file `definitions.units`. In this way, + we avoid any surprises in the expansion of a glob when a new source + file is added with a name equal to an already existing target. + - Only files are considered for matching the glob. Directories are + ignored. + - Matches are only considered at the top-level directory. In this way, + only one directory has to be read during analysis; allowing deeper + globs would require traversal of subdirectories requiring larger + cost. While the explicit `"TREE"` reference allows recursive + traversal, in the typical use case of the respective workspace root + being a `git` root, it is actually cheap; we can look up the `git` + tree identifier without traversing the tree. Such a quick look up + would not be possible if matches had to be selected. + +So, `["GLOB", null, "*.units"]` expands to all the relevant source +files; but we still want to keep the patching. Most rules, like +`"install"`, disallow staging conflicts to avoid accidentally ignoring a +file due to conflicting name. In our case, however, the dropping of the +source file in favour of the patched one is deliberate. For this, there +is the rule `["data", "overlay"]` taking the union of the artifacts of the specified targets, accepting conflicts and resolving them in a -latest-wins fashion. Keep in mind, that our target fields are list, -not sets. Looking at the definition of the rule, one finds that -it is simply a ~"map_union"~. Hence we refine our ~"data"~ target. +latest-wins fashion. Keep in mind, that our target fields are list, not +sets. Looking at the definition of the rule, one finds that it is simply +a `"map_union"`. Hence we refine our `"data"` target. -#+SRCNAME: TARGETS.units -#+BEGIN_SRC js +``` {.jsonc srcname="TARGETS.units"} ... , "data": { "type": ["@", "rules", "data", "overlay"] , "deps": [["GLOB", null, "*.units"], "definitions.units"] } ... -#+END_SRC +``` The result of the analysis, of course, still is the same. -** Finishing the example: binaries from globbed sources +Finishing the example: binaries from globbed sources +---------------------------------------------------- -The source-code organisation of units is pretty simple. All source -and header files are in the top-level directory. As the header files -are not in a directory of their own, we can't use a tree, so we use -a glob, which is fine for the private headers of a binary. For the -source files, we have to have them individually anyway. So our first -attempt of defining the binary is as follows. +The source-code organisation of units is pretty simple. All source and +header files are in the top-level directory. As the header files are not +in a directory of their own, we can't use a tree, so we use a glob, +which is fine for the private headers of a binary. For the source files, +we have to have them individually anyway. So our first attempt of +defining the binary is as follows. -#+SRCNAME: TARGETS.units -#+BEGIN_SRC js +``` {.jsonc srcname="TARGETS.units"} ... , "units-draft": { "type": ["@", "rules", "CC", "binary"] @@ -307,12 +307,12 @@ attempt of defining the binary is as follows. , "private-hdrs": [["GLOB", null, "*.h"]] } ... -#+END_SRC +``` -The result basically work and shows that we have 5 source files in total, -giving 5 compile and one link action. +The result basically work and shows that we have 5 source files in +total, giving 5 compile and one link action. -#+BEGIN_SRC sh +``` sh $ just-mr build units-draft INFO: Requested target is [["@","units","","units-draft"],{}] INFO: Analysed target [["@","units","","units-draft"],{}] @@ -328,13 +328,14 @@ INFO: Processed 6 actions, 0 cache hits. INFO: Artifacts built, logical paths are: units [718cb1489bd006082f966ea73e3fba3dd072d084:124488:x] $ -#+END_SRC +``` -To keep the build clean, we want to get rid of the warning. Of course, we could -simply set an appropriate compiler flag, but let's do things properly and patch -away the underlying reason. To do so, we first create a patch. +To keep the build clean, we want to get rid of the warning. Of course, +we could simply set an appropriate compiler flag, but let's do things +properly and patch away the underlying reason. To do so, we first create +a patch. -#+BEGIN_SRC sh +``` sh $ just-mr install -o . strfunc.c INFO: Requested target is [["@","units","","strfunc.c"],{}] INFO: Analysed target [["@","units","","strfunc.c"],{}] @@ -353,12 +354,11 @@ $ echo -e "109\ns|N|// N\nw\nq" | ed strfunc.c $ diff strfunc.c.orig strfunc.c > files/strfunc.c.diff $ rm strfunc.c* $ -#+END_SRC +``` -Then we amend our ~"units"~ target. +Then we amend our `"units"` target. -#+SRCNAME: TARGETS.units -#+BEGIN_SRC js +``` {.jsonc srcname="TARGETS.units"} ... , "units": { "type": ["@", "rules", "CC", "binary"] @@ -378,14 +378,15 @@ Then we amend our ~"units"~ target. , "patch": [["@", "patches", "", "strfunc.c.diff"]] } ... -#+END_SRC +``` -Building the new target, 2 actions have to be executed: the patching, and -the compiling of the patched source file. As the patched file still generates -the same object file as the unpatched file (after all, we only wanted to get -rid of a warning), the linking step can be taken from cache. +Building the new target, 2 actions have to be executed: the patching, +and the compiling of the patched source file. As the patched file still +generates the same object file as the unpatched file (after all, we only +wanted to get rid of a warning), the linking step can be taken from +cache. -#+BEGIN_SRC sh +``` sh $ just-mr build units INFO: Requested target is [["@","units","","units"],{}] INFO: Analysed target [["@","units","","units"],{}] @@ -396,22 +397,21 @@ INFO: Processed 7 actions, 5 cache hits. INFO: Artifacts built, logical paths are: units [718cb1489bd006082f966ea73e3fba3dd072d084:124488:x] $ -#+END_SRC +``` -To finish the example, we also add a default target (using that, if no target -is specified, ~just~ builds the lexicographically first target), staging -artifacts according to the usual conventions. +To finish the example, we also add a default target (using that, if no +target is specified, `just` builds the lexicographically first target), +staging artifacts according to the usual conventions. -#+SRCNAME: TARGETS.units -#+BEGIN_SRC js +``` {.jsonc srcname="TARGETS.units"} ... , "": {"type": "install", "dirs": [["units", "bin"], ["data", "share/units"]]} ... -#+END_SRC +``` Then things work as expected -#+BEGIN_SRC sh +``` sh $ just-mr install -o /tmp/testinstall INFO: Requested target is [["@","units","",""],{}] INFO: Analysed target [["@","units","",""],{}] @@ -427,4 +427,4 @@ $ /tmp/testinstall/bin/units 'area_saarland' 'area_soccerfield' * 359943.98 / 2.7782101e-06 $ -#+END_SRC +``` diff --git a/doc/tutorial/tests.org b/doc/tutorial/tests.md index d6842ab2..138769b1 100644 --- a/doc/tutorial/tests.org +++ b/doc/tutorial/tests.md @@ -1,38 +1,41 @@ -* Creating Tests +Creating Tests +============== -To run tests with justbuild, we do /not/ have a dedicated ~test~ +To run tests with justbuild, we do *not* have a dedicated `test` subcommand. Instead, we consider tests being a specific action that -generates a test report. Consequently, we use the ~build~ subcommand -to build the test report, and thereby run the test action. Test -actions, however, are slightly different from normal actions in -that we don't want the build of the test report to be aborted if -a test action fails (but still, we want only successfully actions -taken from cache). Rules defining targets containing such special -actions have to identify themselves as /tainted/ by specifying -a string explaining why such special actions are justified; in -our case, the string is ~"test"~. Besides the implicit marking by -using a tainted rule, those tainting strings can also be explicitly -assigned by the user in the definition of a target, e.g., to mark -test data. Any target has to be tainted with (at least) all the -strings any of its dependencies is tainted with. In this way, it -is ensured that no test target will end up in a production build. - -For the remainder of this section, we expect to have the project files available -resulting from successfully completing the tutorial section on /Building C++ -Hello World/. We will demonstrate how to write a test binary for the ~greet~ -library and a shell test for the ~helloworld~ binary. - -** Creating a C++ test binary - -First, we will create a C++ test binary for testing the correct functionality of -the ~greet~ library. Therefore, we need to provide a C++ source file that performs -the actual testing and returns non-~0~ on failure. For simplicity reasons, we do -not use a testing framework for this tutorial. A simple test that captures -standard output and verifies it with the expected output should be provided in -the file ~tests/greet.test.cpp~: - -#+SRCNAME: tests/greet.test.cpp -#+BEGIN_SRC cpp +generates a test report. Consequently, we use the `build` subcommand to +build the test report, and thereby run the test action. Test actions, +however, are slightly different from normal actions in that we don't +want the build of the test report to be aborted if a test action fails +(but still, we want only successfully actions taken from cache). Rules +defining targets containing such special actions have to identify +themselves as *tainted* by specifying a string explaining why such +special actions are justified; in our case, the string is `"test"`. +Besides the implicit marking by using a tainted rule, those tainting +strings can also be explicitly assigned by the user in the definition of +a target, e.g., to mark test data. Any target has to be tainted with (at +least) all the strings any of its dependencies is tainted with. In this +way, it is ensured that no test target will end up in a production +build. + +For the remainder of this section, we expect to have the project files +available resulting from successfully completing the tutorial section on +*Building C++ Hello World*. We will demonstrate how to write a test +binary for the `greet` library and a shell test for the `helloworld` +binary. + +Creating a C++ test binary +-------------------------- + +First, we will create a C++ test binary for testing the correct +functionality of the `greet` library. Therefore, we need to provide a +C++ source file that performs the actual testing and returns non-`0` on +failure. For simplicity reasons, we do not use a testing framework for +this tutorial. A simple test that captures standard output and verifies +it with the expected output should be provided in the file +`tests/greet.test.cpp`: + +``` {.cpp srcname="tests/greet.test.cpp"} #include <functional> #include <iostream> #include <string> @@ -68,15 +71,14 @@ auto test_greet(std::string const& name) -> bool { int main() { return test_greet("World") && test_greet("Universe") ? 0 : 1; } -#+END_SRC +``` -Next, a new test target needs to be created in module ~greet~. This target uses -the rule ~["@", "rules", "CC/test", "test"]~ and needs to depend on the -~["greet", "greet"]~ target. To create the test target, add the following to -~tests/TARGETS~: +Next, a new test target needs to be created in module `greet`. This +target uses the rule `["@", "rules", "CC/test", "test"]` and needs to +depend on the `["greet", "greet"]` target. To create the test target, +add the following to `tests/TARGETS`: -#+SRCNAME: tests/TARGETS -#+BEGIN_SRC js +``` {.jsonc srcname="tests/TARGETS"} { "greet": { "type": ["@", "rules", "CC/test", "test"] , "name": ["test_greet"] @@ -84,35 +86,33 @@ the rule ~["@", "rules", "CC/test", "test"]~ and needs to depend on the , "private-deps": [["greet", "greet"]] } } -#+END_SRC +``` -Before we can run the test, a proper default module for ~CC/test~ must be -provided. By specifying the appropriate target in this module the default test -runner can be overwritten by a different test runner fom the rule's workspace -root. Moreover, all test targets share runner infrastructure from ~shell/test~, -e.g., summarizing multiple runs per test (to detect flakyness) if the configuration -variable ~RUNS_PER_TEST~ is set. +Before we can run the test, a proper default module for `CC/test` must +be provided. By specifying the appropriate target in this module the +default test runner can be overwritten by a different test runner fom +the rule's workspace root. Moreover, all test targets share runner +infrastructure from `shell/test`, e.g., summarizing multiple runs per +test (to detect flakyness) if the configuration variable `RUNS_PER_TEST` +is set. However, in our case, we want to use the default runner and therefore it is sufficient to create an empty module. To do so, create the file -~tutorial-defaults/CC/test/TARGETS~ with content +`tutorial-defaults/CC/test/TARGETS` with content -#+SRCNAME: tutorial-defaults/CC/test/TARGETS -#+BEGIN_SRC js +``` {.jsonc srcname="tutorial-defaults/CC/test/TARGETS"} {} -#+END_SRC +``` -as well as the file ~tutorial-defaults/shell/test/TARGETS~ with content +as well as the file `tutorial-defaults/shell/test/TARGETS` with content -#+SRCNAME: tutorial-defaults/shell/test/TARGETS -#+BEGIN_SRC js +``` {.jsonc srcname="tutorial-defaults/shell/test/TARGETS"} {} -#+END_SRC - +``` Now we can run the test (i.e., build the test result): -#+BEGIN_SRC sh +``` sh $ just-mr build tests greet INFO: Requested target is [["@","tutorial","tests","greet"],{}] INFO: Analysed target [["@","tutorial","tests","greet"],{}] @@ -130,41 +130,45 @@ INFO: Artifacts built, logical paths are: (1 runfiles omitted.) INFO: Target tainted ["test"]. $ -#+END_SRC - -Note that the target is correctly reported as tainted with ~"test"~. It will -produce 3 additional actions for compiling, linking and running the test binary. - -The result of the test target are 5 artifacts: ~result~ (containing ~UNKNOWN~, -~PASS~, or ~FAIL~), ~stderr~, ~stdout~, ~time-start~, and ~time-stop~, and a -single runfile (omitted in the output above), which is a tree artifact with the -name ~test_greet~ that contains all of the above artifacts. The test was run -successfully as otherwise all reported artifacts would have been reported as -~FAILED~ in the output, and justbuild would have returned the exit code ~2~. - -To immediately print the standard output produced by the test binary on the -command line, the ~-P~ option can be used. Argument to this option is the name -of the artifact that should be printed on the command line, in our case -~stdout~: - -#+BEGIN_SRC sh +``` + +Note that the target is correctly reported as tainted with `"test"`. It +will produce 3 additional actions for compiling, linking and running the +test binary. + +The result of the test target are 5 artifacts: `result` (containing +`UNKNOWN`, `PASS`, or `FAIL`), `stderr`, `stdout`, `time-start`, and +`time-stop`, and a single runfile (omitted in the output above), which +is a tree artifact with the name `test_greet` that contains all of the +above artifacts. The test was run successfully as otherwise all reported +artifacts would have been reported as `FAILED` in the output, and +justbuild would have returned the exit code `2`. + +To immediately print the standard output produced by the test binary on +the command line, the `-P` option can be used. Argument to this option +is the name of the artifact that should be printed on the command line, +in our case `stdout`: + +``` sh $ just-mr build tests greet --log-limit 1 -P stdout greet output: Hello World! greet output: Hello Universe! $ -#+END_SRC +``` -Note that ~--log-limit 1~ was just added to omit justbuild's ~INFO:~ prints. +Note that `--log-limit 1` was just added to omit justbuild's `INFO:` +prints. -Our test binary does not have any useful options for directly interacting -with it. When working with test frameworks, it sometimes can be desirable to -get hold of the test binary itself for manual interaction. The running of -the test binary is the last action associated with the test and the test -binary is, of course, one of its inputs. +Our test binary does not have any useful options for directly +interacting with it. When working with test frameworks, it sometimes can +be desirable to get hold of the test binary itself for manual +interaction. The running of the test binary is the last action +associated with the test and the test binary is, of course, one of its +inputs. -#+BEGIN_SRC sh +``` sh $ just-mr analyse --request-action-input -1 tests greet INFO: Requested target is [["@","tutorial","tests","greet"],{}] INFO: Request is input of action #-1 @@ -197,15 +201,15 @@ INFO: Result of input of action #-1 of target [["@","tutorial","tests","greet"], } INFO: Target tainted ["test"]. $ -#+END_SRC +``` The provided data also shows us the precise description of the action -for which we request the input. This allows us to manually rerun -the action. Or we can simply interact with the test binary manually -after installing the inputs to this action. Requesting the inputs -of an action can also be useful when debugging a build failure. +for which we request the input. This allows us to manually rerun the +action. Or we can simply interact with the test binary manually after +installing the inputs to this action. Requesting the inputs of an action +can also be useful when debugging a build failure. -#+BEGIN_SRC sh +``` sh $ just-mr install -o work --request-action-input -1 tests greet INFO: Requested target is [["@","tutorial","tests","greet"],{}] INFO: Request is input of action #-1 @@ -231,26 +235,25 @@ $ echo $? 0 $ cd .. $ rm -rf work -#+END_SRC +``` -** Creating a shell test +Creating a shell test +--------------------- -Similarly, to create a shell test for testing the ~helloworld~ binary, a test -script ~tests/test_helloworld.sh~ must be provided: +Similarly, to create a shell test for testing the `helloworld` binary, a +test script `tests/test_helloworld.sh` must be provided: -#+SRCNAME: tests/test_helloworld.sh -#+BEGIN_SRC sh +``` {.sh srcname="tests/test_helloworld.sh"} set -e [ "$(./helloworld)" = "Hello Universe!" ] -#+END_SRC +``` The test target for this shell tests uses the rule -~["@", "rules", "shell/test", "script"]~ and must depend on the ~"helloworld"~ -target. To create the test target, add the following to the ~tests/TARGETS~ -file: +`["@", "rules", "shell/test", "script"]` and must depend on the +`"helloworld"` target. To create the test target, add the following to +the `tests/TARGETS` file: -#+SRCNAME: tests/TARGETS -#+BEGIN_SRC js +``` {.jsonc srcname="tests/TARGETS"} ... , "helloworld": { "type": ["@", "rules", "shell/test", "script"] @@ -259,11 +262,11 @@ file: , "deps": [["", "helloworld"]] } ... -#+END_SRC +``` Now we can run the shell test (i.e., build the test result): -#+BEGIN_SRC sh +``` sh $ just-mr build tests helloworld INFO: Requested target is [["@","tutorial","tests","helloworld"],{}] INFO: Analysed target [["@","tutorial","tests","helloworld"],{}] @@ -281,29 +284,28 @@ INFO: Artifacts built, logical paths are: (1 runfiles omitted.) INFO: Target tainted ["test"]. $ -#+END_SRC - -The result is also similar, containing also the 5 artifacts and a single runfile -(omitted in the output above), which is a tree artifact with the name -~test_helloworld~ that contains all of the above artifacts. - -** Creating a compound test target - -As most people probably do not want to call every test target by hand, it is -desirable to compound test target that triggers the build of multiple test -reports. To do so, an ~"install"~ target can be used. The field ~"deps"~ of -an install target is a list of targets for which the runfiles are collected. -As for the tests the runfiles happen to be -tree artifacts named the same way as the test and containing all test results, -this is precisely what we need. -Furthermore, as the dependent test targets are tainted by ~"test"~, also the -compound test target must be tainted by the same string. To create the compound -test target combining the two tests above (the tests ~"greet"~ and -~"helloworld"~ from module ~"tests"~), add the following to the ~tests/TARGETS~ -file: - -#+SRCNAME: tests/TARGETS -#+BEGIN_SRC js +``` + +The result is also similar, containing also the 5 artifacts and a single +runfile (omitted in the output above), which is a tree artifact with the +name `test_helloworld` that contains all of the above artifacts. + +Creating a compound test target +------------------------------- + +As most people probably do not want to call every test target by hand, +it is desirable to compound test target that triggers the build of +multiple test reports. To do so, an `"install"` target can be used. The +field `"deps"` of an install target is a list of targets for which the +runfiles are collected. As for the tests the runfiles happen to be tree +artifacts named the same way as the test and containing all test +results, this is precisely what we need. Furthermore, as the dependent +test targets are tainted by `"test"`, also the compound test target must +be tainted by the same string. To create the compound test target +combining the two tests above (the tests `"greet"` and `"helloworld"` +from module `"tests"`), add the following to the `tests/TARGETS` file: + +``` {.jsonc srcname="tests/TARGETS"} ... , "ALL": { "type": "install" @@ -311,12 +313,12 @@ file: , "deps": ["greet", "helloworld"] } ... -#+END_SRC +``` -Now we can run all tests at once by just building the compound test target -~"ALL"~: +Now we can run all tests at once by just building the compound test +target `"ALL"`: -#+BEGIN_SRC sh +``` sh $ just-mr build tests ALL INFO: Requested target is [["@","tutorial","tests","ALL"],{}] INFO: Analysed target [["@","tutorial","tests","ALL"],{}] @@ -330,8 +332,8 @@ INFO: Artifacts built, logical paths are: test_helloworld [63fa5954161b52b275b05c270e1626feaa8e178b:177:t] INFO: Target tainted ["test"]. $ -#+END_SRC +``` -As a result it reports the runfiles (result directories) of both tests as -artifacts. Both tests ran successfully as none of those artifacts in this output -above are tagged as ~FAILED~. +As a result it reports the runfiles (result directories) of both tests +as artifacts. Both tests ran successfully as none of those artifacts in +this output above are tagged as `FAILED`. diff --git a/doc/tutorial/third-party-software.md b/doc/tutorial/third-party-software.md new file mode 100644 index 00000000..daaf5b2d --- /dev/null +++ b/doc/tutorial/third-party-software.md @@ -0,0 +1,473 @@ +Building Third-party Software +============================= + +Third-party projects usually ship with their own build description, +which often happens to be not compatible with justbuild. Nevertheless, +it is highly desireable to include external projects via their source +code base, instead of relying on the integration of out-of-band binary +distributions. justbuild offers a flexible approach to provide the +required build description via an overlay layer without the need to +touch the original code base. + +For the remainder of this section, we expect to have the project files +available resulting from successfully completing the tutorial section on +*Building C++ Hello World*. We will demonstrate how to use the +open-source project [fmtlib](https://github.com/fmtlib/fmt) as an +example for integrating third-party software to a justbuild project. + +Creating the target overlay layer for fmtlib +-------------------------------------------- + +Before we construct the overlay layer for fmtlib, we need to determine +its file structure ([tag +8.1.1](https://github.com/fmtlib/fmt/tree/8.1.1)). The relevant header +and source files are structured as follows: + + fmt + | + +--include + | +--fmt + | +--*.h + | + +--src + +--format.cc + +--os.cc + +The public headers can be found in `include/fmt`, while the library's +source files are located in `src`. For the overlay layer, the `TARGETS` +files should be placed in a tree structure that resembles the original +code base's structure. It is also good practice to provide a top-level +`TARGETS` file, leading to the following structure for the overlay: + + fmt-layer + | + +--TARGETS + +--include + | +--fmt + | +--TARGETS + | + +--src + +--TARGETS + +Let's create the overlay structure: + +``` sh +$ mkdir -p fmt-layer/include/fmt +$ mkdir -p fmt-layer/src +``` + +The directory `include/fmt` contains only header files. As we want all +files in this directory to be included in the `"hdrs"` target, we can +safely use the explicit `TREE` reference[^1], which collects, in a +single artifact (describing a directory) *all* directory contents from +`"."` of the workspace root. Note that the `TARGETS` file is only part +of the overlay, and therefore will not be part of this tree. +Furthermore, this tree should be staged to `"fmt"`, so that any consumer +can include those headers via `<fmt/...>`. The resulting header +directory target `"hdrs"` in `include/fmt/TARGETS` should be described +as: + +``` {.jsonc srcname="fmt-layer/include/fmt/TARGETS"} +{ "hdrs": + { "type": ["@", "rules", "data", "staged"] + , "srcs": [["TREE", null, "."]] + , "stage": ["fmt"] + } +} +``` + +The actual library target is defined in the directory `src`. For the +public headers, it refers to the previously created `"hdrs"` target via +its fully-qualified target name (`["include/fmt", "hdrs"]`). Source +files are the two local files `format.cc`, and `os.cc`. The final target +description in `src/TARGETS` will look like this: + +``` {.jsonc srcname="fmt-layer/src/TARGETS"} +{ "fmt": + { "type": ["@", "rules", "CC", "library"] + , "name": ["fmt"] + , "hdrs": [["include/fmt", "hdrs"]] + , "srcs": ["format.cc", "os.cc"] + } +} +``` + +Finally, the top-level `TARGETS` file can be created. While it is +technically not strictly required, it is considered good practice to +*export* every target that may be used by another project. Exported +targets are subject to high-level target caching, which allows to skip +the analysis and traversal of entire subgraphs in the action graph. +Therefore, we create an export target that exports the target +`["src", "fmt"]`, with only the variables in the field +`"flexible_config"` being propagated. The top-level `TARGETS` file +contains the following content: + +``` {.jsonc srcname="fmt-layer/TARGETS"} +{ "fmt": + { "type": "export" + , "target": ["src", "fmt"] + , "flexible_config": ["CXX", "CXXFLAGS", "ADD_CXXFLAGS", "AR", "ENV"] + } +} +``` + +After adding the library to the multi-repository configuration (next +step), the list of configuration variables a target, like `["src", +"fmt"]`, actually depends on can be obtained using the `--dump-vars` +option of the `analyse` subcommand. In this way, an informed decision +can be taken when deciding which variables of the export target to make +tunable for the consumer. + +Adding fmtlib to the Multi-Repository Configuration +--------------------------------------------------- + +Based on the *hello world* tutorial, we can extend the existing +`repos.json` by the layer definition `"fmt-targets-layer"` and the +repository `"fmtlib"`, which is based on the Git repository with its +target root being overlayed. Furthermore, we want to use `"fmtlib"` in +the repository `"tutorial"`, and therefore need to introduce an +additional binding `"format"` for it: + +``` {.jsonc srcname="repos.json"} +{ "main": "tutorial" +, "repositories": + { "rules-cc": + { "repository": + { "type": "git" + , "branch": "master" + , "commit": "123d8b03bf2440052626151c14c54abce2726e6f" + , "repository": "https://github.com/just-buildsystem/rules-cc.git" + , "subdir": "rules" + } + , "target_root": "tutorial-defaults" + , "rule_root": "rules-cc" + } + , "tutorial": + { "repository": {"type": "file", "path": "."} + , "bindings": {"rules": "rules-cc", "format": "fmtlib"} + } + , "tutorial-defaults": + { "repository": {"type": "file", "path": "./tutorial-defaults"} + } + , "fmt-targets-layer": + { "repository": {"type": "file", "path": "./fmt-layer"} + } + , "fmtlib": + { "repository": + { "type": "git" + , "branch": "8.1.1" + , "commit": "b6f4ceaed0a0a24ccf575fab6c56dd50ccf6f1a9" + , "repository": "https://github.com/fmtlib/fmt.git" + } + , "target_root": "fmt-targets-layer" + , "bindings": {"rules": "rules-cc"} + } + } +} +``` + +This `"format"` binding can you be used to add a new private dependency +in `greet/TARGETS`: + +``` {.jsonc srcname="greet/TARGETS"} +{ "greet": + { "type": ["@", "rules", "CC", "library"] + , "name": ["greet"] + , "hdrs": ["greet.hpp"] + , "srcs": ["greet.cpp"] + , "stage": ["greet"] + , "private-deps": [["@", "format", "", "fmt"]] + } +} +``` + +Consequently, the `fmtlib` library can now be used by `greet/greet.cpp`: + +``` {.cpp srcname="greet/greet.cpp"} +#include "greet.hpp" +#include <fmt/format.h> + +void greet(std::string const& s) { + fmt::print("Hello {}!\n", s); +} +``` + +Due to changes made to `repos.json`, building this tutorial requires to +rerun `just-mr`, which will fetch the necessary sources for the external +repositories: + +``` sh +$ just-mr build helloworld +INFO: Requested target is [["@","tutorial","","helloworld"],{}] +INFO: Analysed target [["@","tutorial","","helloworld"],{}] +INFO: Export targets found: 0 cached, 0 uncached, 1 not eligible for caching +INFO: Discovered 7 actions, 3 trees, 0 blobs +INFO: Building [["@","tutorial","","helloworld"],{}]. +INFO: Processed 7 actions, 1 cache hits. +INFO: Artifacts built, logical paths are: + helloworld [0ec4e36cfb5f2c3efa0fff789349a46694a6d303:132736:x] +$ +``` + +Note to build the `fmt` target alone, its containing repository `fmtlib` +must be specified via the `--main` option: + +``` sh +$ just-mr --main fmtlib build fmt +INFO: Requested target is [["@","fmtlib","","fmt"],{}] +INFO: Analysed target [["@","fmtlib","","fmt"],{}] +INFO: Export targets found: 0 cached, 0 uncached, 1 not eligible for caching +INFO: Discovered 3 actions, 1 trees, 0 blobs +INFO: Building [["@","fmtlib","","fmt"],{}]. +INFO: Processed 3 actions, 3 cache hits. +INFO: Artifacts built, logical paths are: + libfmt.a [513b2ac17c557675fc841f3ebf279003ff5a73ae:240914:f] + (1 runfiles omitted.) +$ +``` + +Employing high-level target caching +----------------------------------- + +The make use of high-level target caching for exported targets, we need +to ensure that all inputs to an export target are transitively +content-fixed. This is automatically the case for `"type":"git"` +repositories. However, the `libfmt` repository also depends on +`"rules-cc"`, `"tutorial-defaults"`, and `"fmt-target-layer"`. As the +latter two are `"type":"file"` repositories, they must be put under Git +versioning first: + +``` sh +$ git init . +$ git add tutorial-defaults fmt-layer +$ git commit -m"fix compile flags and fmt targets layer" +``` + +Note that `rules-cc` already is under Git versioning. + +Now, to instruct `just-mr` to use the content-fixed, committed source +trees of those `"type":"file"` repositories the pragma `"to_git"` must +be set for them in `repos.json`: + +``` {.jsonc srcname="repos.json"} +{ "main": "tutorial" +, "repositories": + { "rules-cc": + { "repository": + { "type": "git" + , "branch": "master" + , "commit": "123d8b03bf2440052626151c14c54abce2726e6f" + , "repository": "https://github.com/just-buildsystem/rules-cc.git" + , "subdir": "rules" + } + , "target_root": "tutorial-defaults" + , "rule_root": "rules-cc" + } + , "tutorial": + { "repository": {"type": "file", "path": "."} + , "bindings": {"rules": "rules-cc", "format": "fmtlib"} + } + , "tutorial-defaults": + { "repository": + { "type": "file" + , "path": "./tutorial-defaults" + , "pragma": {"to_git": true} + } + } + , "fmt-targets-layer": + { "repository": + { "type": "file" + , "path": "./fmt-layer" + , "pragma": {"to_git": true} + } + } + , "fmtlib": + { "repository": + { "type": "git" + , "branch": "master" + , "commit": "b6f4ceaed0a0a24ccf575fab6c56dd50ccf6f1a9" + , "repository": "https://github.com/fmtlib/fmt.git" + } + , "target_root": "fmt-targets-layer" + , "bindings": {"rules": "rules-cc"} + } + } +} +``` + +Due to changes in the repository configuration, we need to rebuild and +the benefits of the target cache should be visible on the second build: + +``` sh +$ just-mr build helloworld +INFO: Requested target is [["@","tutorial","","helloworld"],{}] +INFO: Analysed target [["@","tutorial","","helloworld"],{}] +INFO: Export targets found: 0 cached, 1 uncached, 0 not eligible for caching +INFO: Discovered 7 actions, 3 trees, 0 blobs +INFO: Building [["@","tutorial","","helloworld"],{}]. +INFO: Processed 7 actions, 7 cache hits. +INFO: Artifacts built, logical paths are: + helloworld [0ec4e36cfb5f2c3efa0fff789349a46694a6d303:132736:x] +$ +$ just-mr build helloworld +INFO: Requested target is [["@","tutorial","","helloworld"],{}] +INFO: Analysed target [["@","tutorial","","helloworld"],{}] +INFO: Export targets found: 1 cached, 0 uncached, 0 not eligible for caching +INFO: Discovered 4 actions, 2 trees, 0 blobs +INFO: Building [["@","tutorial","","helloworld"],{}]. +INFO: Processed 4 actions, 4 cache hits. +INFO: Artifacts built, logical paths are: + helloworld [0ec4e36cfb5f2c3efa0fff789349a46694a6d303:132736:x] +$ +``` + +Note that in the second run the export target `"fmt"` was taken from +cache and its 3 actions were eliminated, as their result has been +recorded to the high-level target cache during the first run. + +Combining overlay layers for multiple projects +---------------------------------------------- + +Projects typically depend on multiple external repositories. Creating an +overlay layer for each external repository might unnecessarily clutter +up the repository configuration and the file structure of your +repository. One solution to mitigate this issue is to combine the +`TARGETS` files of multiple external repositories in a single overlay +layer. To avoid conflicts, the `TARGETS` files can be assigned different +file names per repository. As an example, imagine a common overlay layer +with the files `TARGETS.fmt` and `TARGETS.gsl` for the repositories +`"fmtlib"` and `"gsl-lite"`, respectively: + + common-layer + | + +--TARGETS.fmt + +--TARGETS.gsl + +--include + | +--fmt + | | +--TARGETS.fmt + | +--gsl + | +--TARGETS.gsl + | + +--src + +--TARGETS.fmt + +Such a common overlay layer can be used as the target root for both +repositories with only one difference: the `"target_file_name"` field. +By specifying this field, the dispatch where to find the respective +target description for each repository is implemented. For the given +example, the following `repos.json` defines the overlay +`"common-targets-layer"`, which is used by `"fmtlib"` and `"gsl-lite"`: + +``` {.jsonc srcname="repos.json"} +{ "main": "tutorial" +, "repositories": + { "rules-cc": + { "repository": + { "type": "git" + , "branch": "master" + , "commit": "123d8b03bf2440052626151c14c54abce2726e6f" + , "repository": "https://github.com/just-buildsystem/rules-cc.git" + , "subdir": "rules" + } + , "target_root": "tutorial-defaults" + , "rule_root": "rules-cc" + } + , "tutorial": + { "repository": {"type": "file", "path": "."} + , "bindings": {"rules": "rules-cc", "format": "fmtlib"} + } + , "tutorial-defaults": + { "repository": + { "type": "file" + , "path": "./tutorial-defaults" + , "pragma": {"to_git": true} + } + } + , "common-targets-layer": + { "repository": + { "type": "file" + , "path": "./common-layer" + , "pragma": {"to_git": true} + } + } + , "fmtlib": + { "repository": + { "type": "git" + , "branch": "8.1.1" + , "commit": "b6f4ceaed0a0a24ccf575fab6c56dd50ccf6f1a9" + , "repository": "https://github.com/fmtlib/fmt.git" + } + , "target_root": "common-targets-layer" + , "target_file_name": "TARGETS.fmt" + , "bindings": {"rules": "rules-cc"} + } + , "gsl-lite": + { "repository": + { "type": "git" + , "branch": "v0.40.0" + , "commit": "d6c8af99a1d95b3db36f26b4f22dc3bad89952de" + , "repository": "https://github.com/gsl-lite/gsl-lite.git" + } + , "target_root": "common-targets-layer" + , "target_file_name": "TARGETS.gsl" + , "bindings": {"rules": "rules-cc"} + } + } +} +``` + +Using pre-built dependencies +---------------------------- + +While building external dependencies from source brings advantages, most +prominently the flexibility to quickly and seamlessly switch to a +different build configuration (production, debug, instrumented for +performance analysis; cross-compiling for a different target +architecture), there are also legitimate reasons to use pre-built +dependencies. The most prominent one is if your project is packaged as +part of a larger distribution. For that reason, just also has (in +`etc/import.prebuilt`) target files for all its dependencies assuming +they are pre-installed. The reason why target files are used at all for +this situation is twofold. + + - On the one hand, having a target allows the remaining targets to not + care about where their dependencies come from, or if it is a build + against pre-installed dependencies or not. Also, the top-level + binary does not have to know the linking requirements of its + transitive dependencies. In other words, information stays where it + belongs to and if one target acquires a new dependency, the + information is automatically propagated to all targets using it. + - Still some information is needed to use a pre-installed library and, + as explained, a target describing the pre-installed library is the + right place to collect this information. + - The public header files of the library. By having this explicit, + we do not accumulate directories in the include search path and + hence also properly detect include conflicts. + - The information on how to link the library itself (i.e., + basically its base name). + - Any dependencies on other libraries that the library might have. + This information is used to obtain the correct linking order and + complete transitive linking arguments while keeping the + description maintainable, as each target still only declares its + direct dependencies. + +The target description for a pre-built version of the format library +that was used as an example in this section is shown next; with our +staging mechanism the logical repository it belongs to is rooted in the +`fmt` subdirectory of the `include` directory of the ambient system. + +``` {.jsonc srcname="etc/import.prebuilt/TARGETS.fmt"} +{ "fmt": + { "type": ["@", "rules", "CC", "library"] + , "name": ["fmt"] + , "stage": ["fmt"] + , "hdrs": [["TREE", null, "."]] + , "private-ldflags": ["-lfmt"] + } +} +``` + +[^1]: Explicit `TREE` references are always a list of length 3, to + distinguish them from target references of length 2 (module and + target name). Furthermore, the second list element is always `null` + as we only want to allow tree references from the current module. diff --git a/doc/tutorial/third-party-software.org b/doc/tutorial/third-party-software.org deleted file mode 100644 index d1712cc8..00000000 --- a/doc/tutorial/third-party-software.org +++ /dev/null @@ -1,475 +0,0 @@ -* Building Third-party Software - -Third-party projects usually ship with their own build description, which often -happens to be not compatible with justbuild. Nevertheless, it is highly -desireable to include external projects via their source code base, instead of -relying on the integration of out-of-band binary distributions. justbuild offers -a flexible approach to provide the required build description via an overlay -layer without the need to touch the original code base. - -For the remainder of this section, we expect to have the project files available -resulting from successfully completing the tutorial section on /Building C++ -Hello World/. We will demonstrate how to use the open-source project -[[https://github.com/fmtlib/fmt][fmtlib]] as an example for integrating -third-party software to a justbuild project. - -** Creating the target overlay layer for fmtlib - -Before we construct the overlay layer for fmtlib, we need to determine its file -structure ([[https://github.com/fmtlib/fmt/tree/8.1.1][tag 8.1.1]]). The -relevant header and source files are structured as follows: - -#+BEGIN_SRC - fmt - | - +--include - | +--fmt - | +--*.h - | - +--src - +--format.cc - +--os.cc -#+END_SRC - -The public headers can be found in ~include/fmt~, while the library's source -files are located in ~src~. For the overlay layer, the ~TARGETS~ files should be -placed in a tree structure that resembles the original code base's structure. -It is also good practice to provide a top-level ~TARGETS~ file, leading to the -following structure for the overlay: - -#+BEGIN_SRC - fmt-layer - | - +--TARGETS - +--include - | +--fmt - | +--TARGETS - | - +--src - +--TARGETS -#+END_SRC - -Let's create the overlay structure: - -#+BEGIN_SRC sh -$ mkdir -p fmt-layer/include/fmt -$ mkdir -p fmt-layer/src -#+END_SRC - -The directory ~include/fmt~ contains only header files. As we want all files in -this directory to be included in the ~"hdrs"~ target, we can safely -use the explicit ~TREE~ reference[fn:1], which collects, in a single -artifact (describing a directory) /all/ directory contents -from ~"."~ of the workspace root. Note that the ~TARGETS~ file is only part of -the overlay, and -therefore will not be part of this tree. Furthermore, this tree should be staged -to ~"fmt"~, so that any consumer can include those headers via ~<fmt/...>~. The -resulting header directory target ~"hdrs"~ in ~include/fmt/TARGETS~ should be -described as: - -[fn:1] Explicit ~TREE~ references are always a list of length 3, to distinguish -them from target references of length 2 (module and target name). Furthermore, -the second list element is always ~null~ as we only want to allow tree -references from the current module. - - -#+SRCNAME: fmt-layer/include/fmt/TARGETS -#+BEGIN_SRC js -{ "hdrs": - { "type": ["@", "rules", "data", "staged"] - , "srcs": [["TREE", null, "."]] - , "stage": ["fmt"] - } -} -#+END_SRC - -The actual library target is defined in the directory ~src~. For the public -headers, it refers to the previously created ~"hdrs"~ target via its -fully-qualified target name (~["include/fmt", "hdrs"]~). Source files are the -two local files ~format.cc~, and ~os.cc~. The final target description in -~src/TARGETS~ will look like this: - -#+SRCNAME: fmt-layer/src/TARGETS -#+BEGIN_SRC js -{ "fmt": - { "type": ["@", "rules", "CC", "library"] - , "name": ["fmt"] - , "hdrs": [["include/fmt", "hdrs"]] - , "srcs": ["format.cc", "os.cc"] - } -} -#+END_SRC - -Finally, the top-level ~TARGETS~ file can be created. While it is technically -not strictly required, it is considered good practice to /export/ every target -that may be used by another project. Exported targets are subject to high-level -target caching, which allows to skip the analysis and traversal of entire -subgraphs in the action graph. Therefore, we create an export target that -exports the target ~["src", "fmt"]~, with only the variables in the field -~"flexible_config"~ being propagated. The top-level ~TARGETS~ file contains the -following content: - -#+SRCNAME: fmt-layer/TARGETS -#+BEGIN_SRC js -{ "fmt": - { "type": "export" - , "target": ["src", "fmt"] - , "flexible_config": ["CXX", "CXXFLAGS", "ADD_CXXFLAGS", "AR", "ENV"] - } -} -#+END_SRC - -After adding the library to the multi-repository configuration (next -step), the list of configuration variables a target, like ~["src", -"fmt"]~, actually depends on can be obtained using the ~--dump-vars~ -option of the ~analyse~ subcommand. In this way, an informed decision -can be taken when deciding which variables of the export target to -make tunable for the consumer. - -** Adding fmtlib to the Multi-Repository Configuration - -Based on the /hello world/ tutorial, we can extend the existing ~repos.json~ by -the layer definition ~"fmt-targets-layer"~ and the repository ~"fmtlib"~, which -is based on the Git repository with its target root being overlayed. -Furthermore, we want to use ~"fmtlib"~ in the repository ~"tutorial"~, and -therefore need to introduce an additional binding ~"format"~ for it: - -#+SRCNAME: repos.json -#+BEGIN_SRC js -{ "main": "tutorial" -, "repositories": - { "rules-cc": - { "repository": - { "type": "git" - , "branch": "master" - , "commit": "123d8b03bf2440052626151c14c54abce2726e6f" - , "repository": "https://github.com/just-buildsystem/rules-cc.git" - , "subdir": "rules" - } - , "target_root": "tutorial-defaults" - , "rule_root": "rules-cc" - } - , "tutorial": - { "repository": {"type": "file", "path": "."} - , "bindings": {"rules": "rules-cc", "format": "fmtlib"} - } - , "tutorial-defaults": - { "repository": {"type": "file", "path": "./tutorial-defaults"} - } - , "fmt-targets-layer": - { "repository": {"type": "file", "path": "./fmt-layer"} - } - , "fmtlib": - { "repository": - { "type": "git" - , "branch": "8.1.1" - , "commit": "b6f4ceaed0a0a24ccf575fab6c56dd50ccf6f1a9" - , "repository": "https://github.com/fmtlib/fmt.git" - } - , "target_root": "fmt-targets-layer" - , "bindings": {"rules": "rules-cc"} - } - } -} -#+END_SRC - -This ~"format"~ binding can you be used to add a new private dependency in -~greet/TARGETS~: - -#+SRCNAME: greet/TARGETS -#+BEGIN_SRC js -{ "greet": - { "type": ["@", "rules", "CC", "library"] - , "name": ["greet"] - , "hdrs": ["greet.hpp"] - , "srcs": ["greet.cpp"] - , "stage": ["greet"] - , "private-deps": [["@", "format", "", "fmt"]] - } -} -#+END_SRC - -Consequently, the ~fmtlib~ library can now be used by ~greet/greet.cpp~: - -#+SRCNAME: greet/greet.cpp -#+BEGIN_SRC cpp -#include "greet.hpp" -#include <fmt/format.h> - -void greet(std::string const& s) { - fmt::print("Hello {}!\n", s); -} -#+END_SRC - -Due to changes made to ~repos.json~, building this tutorial requires to rerun -~just-mr~, which will fetch the necessary sources for the external repositories: - -#+BEGIN_SRC sh -$ just-mr build helloworld -INFO: Requested target is [["@","tutorial","","helloworld"],{}] -INFO: Analysed target [["@","tutorial","","helloworld"],{}] -INFO: Export targets found: 0 cached, 0 uncached, 1 not eligible for caching -INFO: Discovered 7 actions, 3 trees, 0 blobs -INFO: Building [["@","tutorial","","helloworld"],{}]. -INFO: Processed 7 actions, 1 cache hits. -INFO: Artifacts built, logical paths are: - helloworld [0ec4e36cfb5f2c3efa0fff789349a46694a6d303:132736:x] -$ -#+END_SRC - -Note to build the ~fmt~ target alone, its containing repository ~fmtlib~ must be -specified via the ~--main~ option: -#+BEGIN_SRC sh -$ just-mr --main fmtlib build fmt -INFO: Requested target is [["@","fmtlib","","fmt"],{}] -INFO: Analysed target [["@","fmtlib","","fmt"],{}] -INFO: Export targets found: 0 cached, 0 uncached, 1 not eligible for caching -INFO: Discovered 3 actions, 1 trees, 0 blobs -INFO: Building [["@","fmtlib","","fmt"],{}]. -INFO: Processed 3 actions, 3 cache hits. -INFO: Artifacts built, logical paths are: - libfmt.a [513b2ac17c557675fc841f3ebf279003ff5a73ae:240914:f] - (1 runfiles omitted.) -$ -#+END_SRC - -** Employing high-level target caching - -The make use of high-level target caching for exported targets, we need to -ensure that all inputs to an export target are transitively content-fixed. This -is automatically the case for ~"type":"git"~ repositories. However, the ~libfmt~ -repository also depends on ~"rules-cc"~, ~"tutorial-defaults"~, and -~"fmt-target-layer"~. As the latter two are ~"type":"file"~ repositories, they -must be put under Git versioning first: - -#+BEGIN_SRC sh -$ git init . -$ git add tutorial-defaults fmt-layer -$ git commit -m"fix compile flags and fmt targets layer" -#+END_SRC - -Note that ~rules-cc~ already is under Git versioning. - -Now, to instruct ~just-mr~ to use the content-fixed, committed source trees of -those ~"type":"file"~ repositories the pragma ~"to_git"~ must be set for them in -~repos.json~: - -#+SRCNAME: repos.json -#+BEGIN_SRC js -{ "main": "tutorial" -, "repositories": - { "rules-cc": - { "repository": - { "type": "git" - , "branch": "master" - , "commit": "123d8b03bf2440052626151c14c54abce2726e6f" - , "repository": "https://github.com/just-buildsystem/rules-cc.git" - , "subdir": "rules" - } - , "target_root": "tutorial-defaults" - , "rule_root": "rules-cc" - } - , "tutorial": - { "repository": {"type": "file", "path": "."} - , "bindings": {"rules": "rules-cc", "format": "fmtlib"} - } - , "tutorial-defaults": - { "repository": - { "type": "file" - , "path": "./tutorial-defaults" - , "pragma": {"to_git": true} - } - } - , "fmt-targets-layer": - { "repository": - { "type": "file" - , "path": "./fmt-layer" - , "pragma": {"to_git": true} - } - } - , "fmtlib": - { "repository": - { "type": "git" - , "branch": "master" - , "commit": "b6f4ceaed0a0a24ccf575fab6c56dd50ccf6f1a9" - , "repository": "https://github.com/fmtlib/fmt.git" - } - , "target_root": "fmt-targets-layer" - , "bindings": {"rules": "rules-cc"} - } - } -} -#+END_SRC - -Due to changes in the repository configuration, we need to rebuild and the -benefits of the target cache should be visible on the second build: - -#+BEGIN_SRC sh -$ just-mr build helloworld -INFO: Requested target is [["@","tutorial","","helloworld"],{}] -INFO: Analysed target [["@","tutorial","","helloworld"],{}] -INFO: Export targets found: 0 cached, 1 uncached, 0 not eligible for caching -INFO: Discovered 7 actions, 3 trees, 0 blobs -INFO: Building [["@","tutorial","","helloworld"],{}]. -INFO: Processed 7 actions, 7 cache hits. -INFO: Artifacts built, logical paths are: - helloworld [0ec4e36cfb5f2c3efa0fff789349a46694a6d303:132736:x] -$ -$ just-mr build helloworld -INFO: Requested target is [["@","tutorial","","helloworld"],{}] -INFO: Analysed target [["@","tutorial","","helloworld"],{}] -INFO: Export targets found: 1 cached, 0 uncached, 0 not eligible for caching -INFO: Discovered 4 actions, 2 trees, 0 blobs -INFO: Building [["@","tutorial","","helloworld"],{}]. -INFO: Processed 4 actions, 4 cache hits. -INFO: Artifacts built, logical paths are: - helloworld [0ec4e36cfb5f2c3efa0fff789349a46694a6d303:132736:x] -$ -#+END_SRC - -Note that in the second run the export target ~"fmt"~ was taken from cache and -its 3 actions were eliminated, as their result has been recorded to the -high-level target cache during the first run. - -** Combining overlay layers for multiple projects - -Projects typically depend on multiple external repositories. Creating an overlay -layer for each external repository might unnecessarily clutter up the repository -configuration and the file structure of your repository. One solution to -mitigate this issue is to combine the ~TARGETS~ files of multiple external -repositories in a single overlay layer. To avoid conflicts, the ~TARGETS~ files -can be assigned different file names per repository. As an example, imagine a -common overlay layer with the files ~TARGETS.fmt~ and ~TARGETS.gsl~ for the -repositories ~"fmtlib"~ and ~"gsl-lite"~, respectively: - -#+BEGIN_SRC - common-layer - | - +--TARGETS.fmt - +--TARGETS.gsl - +--include - | +--fmt - | | +--TARGETS.fmt - | +--gsl - | +--TARGETS.gsl - | - +--src - +--TARGETS.fmt -#+END_SRC - -Such a common overlay layer can be used as the target root for both repositories -with only one difference: the ~"target_file_name"~ field. By specifying this -field, the dispatch where to find the respective target description for each -repository is implemented. For the given example, the following ~repos.json~ -defines the overlay ~"common-targets-layer"~, which is used by ~"fmtlib"~ and -~"gsl-lite"~: - -#+SRCNAME: repos.json -#+BEGIN_SRC js -{ "main": "tutorial" -, "repositories": - { "rules-cc": - { "repository": - { "type": "git" - , "branch": "master" - , "commit": "123d8b03bf2440052626151c14c54abce2726e6f" - , "repository": "https://github.com/just-buildsystem/rules-cc.git" - , "subdir": "rules" - } - , "target_root": "tutorial-defaults" - , "rule_root": "rules-cc" - } - , "tutorial": - { "repository": {"type": "file", "path": "."} - , "bindings": {"rules": "rules-cc", "format": "fmtlib"} - } - , "tutorial-defaults": - { "repository": - { "type": "file" - , "path": "./tutorial-defaults" - , "pragma": {"to_git": true} - } - } - , "common-targets-layer": - { "repository": - { "type": "file" - , "path": "./common-layer" - , "pragma": {"to_git": true} - } - } - , "fmtlib": - { "repository": - { "type": "git" - , "branch": "8.1.1" - , "commit": "b6f4ceaed0a0a24ccf575fab6c56dd50ccf6f1a9" - , "repository": "https://github.com/fmtlib/fmt.git" - } - , "target_root": "common-targets-layer" - , "target_file_name": "TARGETS.fmt" - , "bindings": {"rules": "rules-cc"} - } - , "gsl-lite": - { "repository": - { "type": "git" - , "branch": "v0.40.0" - , "commit": "d6c8af99a1d95b3db36f26b4f22dc3bad89952de" - , "repository": "https://github.com/gsl-lite/gsl-lite.git" - } - , "target_root": "common-targets-layer" - , "target_file_name": "TARGETS.gsl" - , "bindings": {"rules": "rules-cc"} - } - } -} -#+END_SRC - -** Using pre-built dependencies - -While building external dependencies from source brings advantages, -most prominently the flexibility to quickly and seamlessly switch -to a different build configuration (production, debug, instrumented -for performance analysis; cross-compiling for a different target -architecture), there are also legitimate reasons to use pre-built -dependencies. The most prominent one is if your project is packaged -as part of a larger distribution. For that reason, just also has (in -~etc/import.prebuilt~) target files for all its dependencies assuming -they are pre-installed. The reason why target files are used at -all for this situation is twofold. -- On the one hand, having a target allows the remaining targets - to not care about where their dependencies come from, or if it - is a build against pre-installed dependencies or not. Also, the - top-level binary does not have to know the linking requirements - of its transitive dependencies. In other words, information stays - where it belongs to and if one target acquires a new dependency, - the information is automatically propagated to all targets using it. -- Still some information is needed to use a pre-installed library - and, as explained, a target describing the pre-installed library - is the right place to collect this information. - - The public header files of the library. By having this explicit, - we do not accumulate directories in the include search path - and hence also properly detect include conflicts. - - The information on how to link the library itself (i.e., - basically its base name). - - Any dependencies on other libraries that the library might have. - This information is used to obtain the correct linking order - and complete transitive linking arguments while keeping the - description maintainable, as each target still only declares - its direct dependencies. - -The target description for a pre-built version of the format -library that was used as an example in this section is shown next; -with our staging mechanism the logical repository it belongs to is -rooted in the ~fmt~ subdirectory of the ~include~ directory of the -ambient system. - -#+SRCNAME: etc/import.prebuilt/TARGETS.fmt -#+BEGIN_SRC js -{ "fmt": - { "type": ["@", "rules", "CC", "library"] - , "name": ["fmt"] - , "stage": ["fmt"] - , "hdrs": [["TREE", null, "."]] - , "private-ldflags": ["-lfmt"] - } -} -#+END_SRC |