summaryrefslogtreecommitdiff
path: root/doc/concepts
diff options
context:
space:
mode:
authorOliver Reiche <oliver.reiche@huawei.com>2023-06-01 13:36:32 +0200
committerOliver Reiche <oliver.reiche@huawei.com>2023-06-12 16:29:05 +0200
commitb66a7359fbbff35af630c88c56598bbc06b393e1 (patch)
treed866802c4b44c13cbd90f9919cc7fc472091be0c /doc/concepts
parent144b2c619f28c91663936cd445251ca28af45f88 (diff)
downloadjustbuild-b66a7359fbbff35af630c88c56598bbc06b393e1.tar.gz
doc: Convert orgmode files to markdown
Diffstat (limited to 'doc/concepts')
-rw-r--r--doc/concepts/anonymous-targets.md345
-rw-r--r--doc/concepts/anonymous-targets.org336
-rw-r--r--doc/concepts/built-in-rules.md172
-rw-r--r--doc/concepts/built-in-rules.org167
-rw-r--r--doc/concepts/cache-pragma.md134
-rw-r--r--doc/concepts/cache-pragma.org130
-rw-r--r--doc/concepts/configuration.md115
-rw-r--r--doc/concepts/configuration.org107
-rw-r--r--doc/concepts/doc-strings.md152
-rw-r--r--doc/concepts/doc-strings.org145
-rw-r--r--doc/concepts/expressions.md368
-rw-r--r--doc/concepts/expressions.org344
-rw-r--r--doc/concepts/garbage.md86
-rw-r--r--doc/concepts/garbage.org82
-rw-r--r--doc/concepts/multi-repo.md170
-rw-r--r--doc/concepts/multi-repo.org167
-rw-r--r--doc/concepts/overview.md210
-rw-r--r--doc/concepts/overview.org206
-rw-r--r--doc/concepts/rules.md567
-rw-r--r--doc/concepts/rules.org551
-rw-r--r--doc/concepts/target-cache.md231
-rw-r--r--doc/concepts/target-cache.org219
22 files changed, 2550 insertions, 2454 deletions
diff --git a/doc/concepts/anonymous-targets.md b/doc/concepts/anonymous-targets.md
new file mode 100644
index 00000000..6692d0ae
--- /dev/null
+++ b/doc/concepts/anonymous-targets.md
@@ -0,0 +1,345 @@
+Anonymous targets
+=================
+
+Motivation
+----------
+
+Using [Protocol buffers](https://github.com/protocolbuffers/protobuf)
+allows to specify, in a language-independent way, a wire format for
+structured data. This is done by using description files from which APIs
+for various languages can be generated. As protocol buffers can contain
+other protocol buffers, the description files themselves have a
+dependency structure.
+
+From a software-engineering point of view, the challenge is to ensure
+that the author of the description files does not have to be aware of
+the languages for which APIs will be generated later. In fact, the main
+benefit of the language-independent description is that clients in
+various languages can be implemented using the same wire protocol (and
+thus capable of communicating with the same server).
+
+For a build system that means that we have to expect that language
+bindings at places far away from the protocol definition, and
+potentially several times. Such a duplication can also occur implicitly
+if two buffers, for which language bindings are generated both use a
+common buffer for which bindings are never requested explicitly. Still,
+we want to avoid duplicate work for common parts and we have to avoid
+conflicts with duplicate symbols and staging conflicts for the libraries
+for the common part.
+
+Our approach is that a "proto" target only provides the description
+files together with their dependency structure. From those, a consuming
+target generates "anonymous targets" as additional dependencies; as
+those targets will have an appropriate notion of equality, no duplicate
+work is done and hence, as a side effect, staging or symbol conflicts
+are avoided as well.
+
+Preliminary remark: action identifiers
+--------------------------------------
+
+Actions are defined as Merkle-tree hash of the contents. As all
+components (input tree, list of output strings, command vector,
+environment, and cache pragma) are given by expressions, that can
+quickly be computed. This identifier also defines the notion of equality
+for actions, and hence action artifacts. Recall that equality of
+artifacts is also (implicitly) used in our notion of disjoint map union
+(where the set of keys does not have to be disjoint, as long as the
+values for all duplicate keys are equal).
+
+When constructing the action graph for traversal, we can drop duplicates
+(i.e., actions with the same identifier, and hence the same
+description). For the serialization of the graph as part of the analyse
+command, we can afford the preparatory step to compute a map from action
+id to list of origins.
+
+Equality
+--------
+
+### Notions of equality
+
+In the context of builds, there are different concepts of equality to
+consider. We recall the definitions, as well as their use in our build
+tool.
+
+#### Locational equality ("Defined at the same place")
+
+Names (for targets and rules) are given by repository name, module
+name, and target name (inside the module); additionally, for target
+names, there's a bit specifying that we explicitly refer to a file.
+Names are equal if and only if the respective strings (and the file
+bit) are equal.
+
+For targets, we use locational equality, i.e., we consider targets
+equal precisely if their names are equal; targets defined at
+different places are considered different, even if they're defined
+in the same way. The reason we use notion of equality is that we
+have to refer to targets (and also check if we already have a
+pending task to analyse them) before we have fully explored them
+with all the targets referred to in their definition.
+
+#### Intensional equality ("Defined in the same way")
+
+In our expression language we handle definitions; in particular, we
+treat artifacts by their definition: a particular source file, the
+output of a particular action, etc. Hence we use intensional
+equality in our expression language; two objects are equal precisely
+if they are defined in the same way. This notion of equality is easy
+to determine without the need of reading a source file or running an
+action. We implement quick tests by keeping a Merkle-tree hash of
+all expression values.
+
+#### Extensional equality ("Defining the same object")
+
+For built artifacts, we use extensional equality, i.e., we consider
+two files equal, if they are bit-by-bit identical.
+Implementation-wise, we compare an appropriate cryptographic hash.
+Before running an action, we built its inputs. In particular (as
+inputs are considered extensionally) an action might cause a cache
+hit with an intensionally different one.
+
+#### Observable equality ("The defined objects behave in the same way")
+
+Finally, there is the notion of observable equality, i.e., the
+property that two binaries behaving the same way in all situations.
+As this notion is undecidable, it is never used directly by any
+build tool. However, it is often the motivation for a build in the
+first place: we want a binary that behaves in a particular way.
+
+### Relation between these notions
+
+The notions of equality were introduced in order from most fine grained
+to most coarse. Targets defined at the same place are obviously defined
+in the same way. Intensionally equal artifacts create equal action
+graphs; here we can confidently say "equal" and not only isomorphic:
+due to our preliminary clean up, even the node names are equal. Making
+sure that equal actions produce bit-by-bit equal outputs is the realm of
+[reproducibe builds](https://reproducible-builds.org/). The tool can
+support this by appropriate sandboxing, etc, but the rules still have to
+define actions that don't pick up non-input information like the
+current time, user id, readdir order, etc. Files that are bit-by-bit
+identical will behave in the same way.
+
+### Example
+
+Consider the following target file.
+
+```jsonc
+{ "foo":
+ { "type": "generic"
+ , "outs": ["out.txt"]
+ , "cmds": ["echo Hello World > out.txt"]
+ }
+, "bar":
+ { "type": "generic"
+ , "outs": ["out.txt"]
+ , "cmds": ["echo Hello World > out.txt"]
+ }
+, "baz":
+ { "type": "generic"
+ , "outs": ["out.txt"]
+ , "cmds": ["echo -n Hello > out.txt && echo ' World' >> out.txt"]
+ }
+, "foo upper":
+ { "type": "generic"
+ , "deps": ["foo"]
+ , "outs": ["upper.txt"]
+ , "cmds": ["cat out.txt | tr a-z A-Z > upper.txt"]
+ }
+, "bar upper":
+ { "type": "generic"
+ , "deps": ["bar"]
+ , "outs": ["upper.txt"]
+ , "cmds": ["cat out.txt | tr a-z A-Z > upper.txt"]
+ }
+, "baz upper":
+ { "type": "generic"
+ , "deps": ["baz"]
+ , "outs": ["upper.txt"]
+ , "cmds": ["cat out.txt | tr a-z A-Z > upper.txt"]
+ }
+, "ALL":
+ { "type": "install"
+ , "files":
+ {"foo.txt": "foo upper", "bar.txt": "bar upper", "baz.txt": "baz upper"}
+ }
+}
+```
+
+Assume we build the target `"ALL"`. Then we will analyse 7 targets, all
+the locationally different ones (`"foo"`, `"bar"`, `"baz"`,
+`"foo upper"`, `"bar upper"`, `"baz upper"`). For the targets `"foo"`
+and `"bar"`, we immediately see that the definition is equal; their
+intensional equality also renders `"foo upper"` and `"bar upper"`
+intensionally equal. Our action graph will contain 4 actions: one with
+origins `["foo", "bar"]`, one with origins `["baz"]`, one with origins
+`["foo upper", "bar upper"]`, and one with origins `["baz
+upper"]`. The `"install"` target will, of course, not create any
+actions. Building sequentially (`-J 1`), we will get one cache hit. Even
+though the artifacts of `"foo"` and `"bar"` and of `"baz"` are defined
+differently, they are extensionally equal; both define a file with
+contents `"Hello World\n"`.
+
+Anonymous targets
+-----------------
+
+Besides named targets we also have additional targets (and hence also
+configured targets) that are not associated with a location they are
+defined at. Due to the absence of definition location, their notion of
+equality will take care of the necessary deduplication (implicitly, by
+the way our dependency exploration works). We will call them "anonymous
+targets", even though, technically, they're not fully anonymous as the
+rules that are part of their structure will be given by name, i.e.,
+defining rule location.
+
+### Value type: target graph node
+
+In order to allow targets to adequately describe a dependency structure,
+we have a value type in our expression language, that of a (target)
+graph node. As with all value types, equality is intensional, i.e.,
+nodes defined in the same way are equal even if defined at different
+places. This can be achieved by our usual approach for expressions of
+having cached Merkle-tree hashes and comparing them when an equality
+test is required. This efficient test for equality also allows using
+graph nodes as part of a map key, e.g., for our asynchronous map
+consumers.
+
+As a graph node can only be defined with all data given, the defined
+dependency structure is cycle-free by construction. However, the tree
+unfolding will usually be exponentially larger. For internal handling,
+this is not a problem: our shared-pointer implementation can efficiently
+represent a directed acyclic graph and since we cache hashes in
+expressions, we can compute the overall hash without folding the
+structure to a tree. When presenting nodes to the user, we only show the
+map of identifier to definition, to avoid that exponential unfolding.
+
+We have two kinds of nodes.
+
+#### Value nodes
+
+These represent a target that, in any configuration, returns a fixed
+value. Source files would typically be represented this way. The
+constructor function `"VALUE_NODE"` takes a single argument `"$1"`
+that has to be a result value.
+
+#### Abstract nodes
+
+These represent internal nodes in the dag. Their constructor
+`"ABSTRACT_NODE"` takes the following arguments (all evaluated).
+
+ - `"node_type"`. An arbitrary string, not interpreted in any way,
+ to indicate the role that the node has in the dependency
+ structure. When we create an anonymous target from a node, this
+ will serve as the key into the rule mapping to be applied.
+ - `"string_fields"`. This has to be a map of strings.
+ - `"target_fields"`. These have to be a map of lists of graph
+ nodes.
+
+Moreover, we require that the keys for maps provided as
+`"string_fields"` and `"target_fields"` be disjoint.
+
+### Graph nodes in `export` targets
+
+Graph nodes are completely free of names and hence are eligible for
+exporting. As with other values, in the cache the intensional definition
+of artifacts implicit in them will be replaced by the corresponding,
+extensionally equal, known value.
+
+However, some care has to be taken in the serialisation that is part of
+the caching, as we do not want to unfold the dag to a tree. Therefore,
+we take as JSON serialisation a simple dict with `"type"` set to
+`"NODE"`, and `"value"` set to the Merkle-tree hash. That serialisation
+respects intensional equality. To allow deserialisation, we add an
+additional map to the serialisation from node hash to its definition.
+
+### Dependings on anonymous targets
+
+#### Parts of an anonymous target
+
+An anonymous target is given by a pair of a node and a map mapping
+the abstract node-type specifying strings to rule names. So, in the
+implementation these are just two expression pointers (with their
+defined notion of equality, i.e., equality of the respective
+Merkle-tree hashes). Such a pair of pointers also forms an
+additional variant of a name value, referring to such an anonymous
+target.
+
+It should be noted that such an anonymous target contains all the
+information needed to evaluate it in the same way as a regular
+(named) target defined by a user-defined rule. It is an analysis
+error analysing an anonymous target where there is no entry in the
+rules map for the string given as `"node_type"` for the
+corresponding node.
+
+#### Anonymous targets as additional dependencies
+
+We keep the property that a user can only request named targets. So
+anonymous targets have to be requested by other targets. We also
+keep the property that other targets are only requested at certain
+fixed steps in the evaluation of a target. To still achieve a
+meaningful use of anonymous targets our rule language handles
+anonymous targets in the following way.
+
+##### Rules parameter `"anonymous"`
+
+In the rule definition a parameter `"anonymous"` (with empty map
+as default) is allowed. It is used to define an additional
+dependency on anonymous targets. The value has to be a map with
+keys the additional implicitly defined field names. It is hence
+a requirement that the set of keys be disjoint from all other
+field names (the values of `"config_fields"`, `"string_fields"`,
+and `"target_fields"`, as well as the keys of the `"implict"`
+parameter). Another consequence is that `"config_transitions"`
+map may now also have meaningful entries for the keys of the
+`"anonymous"` map. Each value in the map has to be itself a map,
+with entries `"target"`, `"provider"`, and `"rule_map"`.
+
+For `"target"`, a single string has to be specifed, and the
+value has to be a member of the `"target_fields"` list. For
+provider, a single string has to be specified as well. The idea
+is that the nodes are collected from that provider of the
+targets in the specified target field. For `"rule_map"` a map
+has to be specified from strings to rule names; the latter are
+evaluated in the context of the rule definition.
+
+###### Example
+
+For generating language bindings for protocol buffers, a
+rule might look as follows.
+
+``` jsonc
+{ "cc_proto_bindings":
+ { "target_fields": ["proto_deps"]
+ , "anonymous":
+ { "protos":
+ { "target": "proto_deps"
+ , "provider": "proto"
+ , "rule_map": {"proto_library": "cc_proto_library"}
+ }
+ }
+ , "expression": {...}
+ }
+}
+```
+
+##### Evaluation mechanism
+
+The evaluation of a target defined by a user-defined rule is
+handled as follows. After the target fields are evaluated as
+usual, an additional step is carried out.
+
+For each anonymous-target field, i.e., for each key in the
+`"anonymous"` map, a list of anonymous targets is generated from
+the corresponding value: take all targets from the specified
+`"target"` field in all their specified configuration
+transitions (they have already been evaluated) and take the
+values provided for the specified `"provider"` key (using the
+empty list as default). That value has to be a list of nodes.
+All the node lists obtained that way are concatenated. The
+configuration transition for the respective field name is
+evaluated. Those targets are then evaluated for all the
+transitioned configurations requested.
+
+In the final evaluation of the defining expression, the
+anonymous-target fields are available in the same way as any
+other target field. Also, they contribute to the effective
+configuration in the same way as regular target fields.
diff --git a/doc/concepts/anonymous-targets.org b/doc/concepts/anonymous-targets.org
deleted file mode 100644
index 98d194c7..00000000
--- a/doc/concepts/anonymous-targets.org
+++ /dev/null
@@ -1,336 +0,0 @@
-* Anonymous targets
-** Motivation
-
-Using [[https://github.com/protocolbuffers/protobuf][Protocol
-buffers]] allows to specify, in a language-independent way, a wire
-format for structured data. This is done by using description files
-from which APIs for various languages can be generated. As protocol
-buffers can contain other protocol buffers, the description files
-themselves have a dependency structure.
-
-From a software-engineering point of view, the challenge is to
-ensure that the author of the description files does not have to
-be aware of the languages for which APIs will be generated later.
-In fact, the main benefit of the language-independent description
-is that clients in various languages can be implemented using the
-same wire protocol (and thus capable of communicating with the
-same server).
-
-For a build system that means that we have to expect that language
-bindings at places far away from the protocol definition, and
-potentially several times. Such a duplication can also occur
-implicitly if two buffers, for which language bindings are generated
-both use a common buffer for which bindings are never requested
-explicitly. Still, we want to avoid duplicate work for common parts
-and we have to avoid conflicts with duplicate symbols and staging
-conflicts for the libraries for the common part.
-
-Our approach is that a "proto" target only provides the description
-files together with their dependency structure. From those, a
-consuming target generates "anonymous targets" as additional
-dependencies; as those targets will have an appropriate notion of
-equality, no duplicate work is done and hence, as a side effect,
-staging or symbol conflicts are avoided as well.
-
-** Preliminary remark: action identifiers
-
-Actions are defined as Merkle-tree hash of the contents. As all
-components (input tree, list of output strings, command vector,
-environment, and cache pragma) are given by expressions, that can
-quickly be computed. This identifier also defines the notion of
-equality for actions, and hence action artifacts. Recall that equality
-of artifacts is also (implicitly) used in our notion of disjoint
-map union (where the set of keys does not have to be disjoint, as
-long as the values for all duplicate keys are equal).
-
-When constructing the action graph for traversal, we can drop
-duplicates (i.e., actions with the same identifier, and hence the
-same description). For the serialization of the graph as part of
-the analyse command, we can afford the preparatory step to compute
-a map from action id to list of origins.
-
-** Equality
-
-*** Notions of equality
-
-In the context of builds, there are different concepts of equality
-to consider. We recall the definitions, as well as their use in
-our build tool.
-
-**** Locational equality ("Defined at the same place")
-
-Names (for targets and rules) are given by repository name, module
-name, and target name (inside the module); additionally, for target
-names, there's a bit specifying that we explicitly refer to a file.
-Names are equal if and only if the respective strings (and the file
-bit) are equal.
-
-For targets, we use locational equality, i.e., we consider targets
-equal precisely if their names are equal; targets defined at different
-places are considered different, even if they're defined in the
-same way. The reason we use notion of equality is that we have to
-refer to targets (and also check if we already have a pending task
-to analyse them) before we have fully explored them with all the
-targets referred to in their definition.
-
-**** Intensional equality ("Defined in the same way")
-
-In our expression language we handle definitions; in particular,
-we treat artifacts by their definition: a particular source file,
-the output of a particular action, etc. Hence we use intensional
-equality in our expression language; two objects are equal precisely
-if they are defined in the same way. This notion of equality is easy
-to determine without the need of reading a source file or running
-an action. We implement quick tests by keeping a Merkle-tree hash
-of all expression values.
-
-**** Extensional equality ("Defining the same object")
-
-For built artifacts, we use extensional equality, i.e., we consider
-two files equal, if they are bit-by-bit identical. Implementation-wise,
-we compare an appropriate cryptographic hash. Before running an
-action, we built its inputs. In particular (as inputs are considered
-extensionally) an action might cause a cache hit with an intensionally
-different one.
-
-**** Observable equality ("The defined objects behave in the same way")
-
-Finally, there is the notion of observable equality, i.e., the
-property that two binaries behaving the same way in all situations.
-As this notion is undecidable, it is never used directly by any
-build tool. However, it is often the motivation for a build in the
-first place: we want a binary that behaves in a particular way.
-
-*** Relation between these notions
-
-The notions of equality were introduced in order from most fine grained
-to most coarse. Targets defined at the same place are obviously defined
-in the same way. Intensionally equal artifacts create equal action
-graphs; here we can confidently say "equal" and not only isomorphic:
-due to our preliminary clean up, even the node names are equal.
-Making sure that equal actions produce bit-by-bit equal outputs
-is the realm of [[https://reproducible-builds.org/][reproducibe
-builds]]. The tool can support this by appropriate sandboxing,
-etc, but the rules still have to define actions that don't pick
-up non-input information like the current time, user id, readdir
-order, etc. Files that are bit-by-bit identical will behave in
-the same way.
-
-*** Example
-
-Consider the following target file.
-
-#+BEGIN_SRC
-{ "foo":
- { "type": "generic"
- , "outs": ["out.txt"]
- , "cmds": ["echo Hello World > out.txt"]
- }
-, "bar":
- { "type": "generic"
- , "outs": ["out.txt"]
- , "cmds": ["echo Hello World > out.txt"]
- }
-, "baz":
- { "type": "generic"
- , "outs": ["out.txt"]
- , "cmds": ["echo -n Hello > out.txt && echo ' World' >> out.txt"]
- }
-, "foo upper":
- { "type": "generic"
- , "deps": ["foo"]
- , "outs": ["upper.txt"]
- , "cmds": ["cat out.txt | tr a-z A-Z > upper.txt"]
- }
-, "bar upper":
- { "type": "generic"
- , "deps": ["bar"]
- , "outs": ["upper.txt"]
- , "cmds": ["cat out.txt | tr a-z A-Z > upper.txt"]
- }
-, "baz upper":
- { "type": "generic"
- , "deps": ["baz"]
- , "outs": ["upper.txt"]
- , "cmds": ["cat out.txt | tr a-z A-Z > upper.txt"]
- }
-, "ALL":
- { "type": "install"
- , "files":
- {"foo.txt": "foo upper", "bar.txt": "bar upper", "baz.txt": "baz upper"}
- }
-}
-#+END_SRC
-
-Assume we build the target ~"ALL"~. Then we will analyse 7 targets,
-all the locationally different ones (~"foo"~, ~"bar"~, ~"baz"~,
-~"foo upper"~, ~"bar upper"~, ~"baz upper"~). For the targets ~"foo"~
-and ~"bar"~, we immediately see that the definition is equal; their
-intensional equality also renders ~"foo upper"~ and ~"bar upper"~
-intensionally equal. Our action graph will contain 4 actions: one
-with origins ~["foo", "bar"]~, one with origins ~["baz"]~, one with
-origins ~["foo upper", "bar upper"]~, and one with origins ~["baz
-upper"]~. The ~"install"~ target will, of course, not create any
-actions. Building sequentially (~-J 1~), we will get one cache hit.
-Even though the artifacts of ~"foo"~ and ~"bar"~ and of ~"baz~"
-are defined differently, they are extensionally equal; both define
-a file with contents ~"Hello World\n"~.
-
-** Anonymous targets
-
-Besides named targets we also have additional targets (and hence also
-configured targets) that are not associated with a location they are
-defined at. Due to the absence of definition location, their notion
-of equality will take care of the necessary deduplication (implicitly,
-by the way our dependency exploration works). We will call them
-"anonymous targets", even though, technically, they're not fully
-anonymous as the rules that are part of their structure will be
-given by name, i.e., defining rule location.
-
-*** Value type: target graph node
-
-In order to allow targets to adequately describe a dependency
-structure, we have a value type in our expression language, that
-of a (target) graph node. As with all value types, equality is
-intensional, i.e., nodes defined in the same way are equal even
-if defined at different places. This can be achieved by our usual
-approach for expressions of having cached Merkle-tree hashes and
-comparing them when an equality test is required. This efficient
-test for equality also allows using graph nodes as part of a map
-key, e.g., for our asynchronous map consumers.
-
-As a graph node can only be defined with all data given, the defined
-dependency structure is cycle-free by construction. However, the
-tree unfolding will usually be exponentially larger. For internal
-handling, this is not a problem: our shared-pointer implementation
-can efficiently represent a directed acyclic graph and since we
-cache hashes in expressions, we can compute the overall hash without
-folding the structure to a tree. When presenting nodes to the user,
-we only show the map of identifier to definition, to avoid that
-exponential unfolding.
-
-We have two kinds of nodes.
-
-**** Value nodes
-
-These represent a target that, in any configuration, returns a fixed
-value. Source files would typically be represented this way. The
-constructor function ~"VALUE_NODE"~ takes a single argument ~"$1"~
-that has to be a result value.
-
-**** Abstract nodes
-
-These represent internal nodes in the dag. Their constructor
-~"ABSTRACT_NODE"~ takes the following arguments (all evaluated).
-- ~"node_type"~. An arbitrary string, not interpreted in any way, to
- indicate the role that the node has in the dependency structure.
- When we create an anonymous target from a node, this will serve
- as the key into the rule mapping to be applied.
-- ~"string_fields"~. This has to be a map of strings.
-- ~"target_fields"~. These have to be a map of lists of graph nodes.
-Moreover, we require that the keys for maps provided as ~"string_fields"~
-and ~"target_fields"~ be disjoint.
-
-*** Graph nodes in ~export~ targets
-
-Graph nodes are completely free of names and hence are eligible
-for exporting. As with other values, in the cache the intensional
-definition of artifacts implicit in them will be replaced by the
-corresponding, extensionally equal, known value.
-
-However, some care has to be taken in the serialisation that is
-part of the caching, as we do not want to unfold the dag to
-a tree. Therefore, we take as JSON serialisation a simple dict
-with ~"type"~ set to ~"NODE"~, and ~"value"~ set to the Merkle-tree
-hash. That serialisation respects intensional equality. To allow
-deserialisation, we add an additional map to the serialisation from
-node hash to its definition.
-
-*** Dependings on anonymous targets
-
-**** Parts of an anonymous target
-
-An anonymous target is given by a pair of a node and a map mapping
-the abstract node-type specifying strings to rule names. So, in
-the implementation these are just two expression pointers (with
-their defined notion of equality, i.e., equality of the respective
-Merkle-tree hashes). Such a pair of pointers also forms an additional
-variant of a name value, referring to such an anonymous target.
-
-It should be noted that such an anonymous target contains all the
-information needed to evaluate it in the same way as a regular (named)
-target defined by a user-defined rule. It is an analysis error
-analysing an anonymous target where there is no entry in the rules
-map for the string given as ~"node_type"~ for the corresponding node.
-
-**** Anonymous targets as additional dependencies
-
-We keep the property that a user can only request named targets.
-So anonymous targets have to be requested by other targets. We
-also keep the property that other targets are only requested at
-certain fixed steps in the evaluation of a target. To still achieve
-a meaningful use of anonymous targets our rule language handles
-anonymous targets in the following way.
-
-***** Rules parameter ~"anonymous"~
-
-In the rule definition a parameter ~"anonymous"~ (with empty map as
-default) is allowed. It is used to define an additional dependency on
-anonymous targets. The value has to be a map with keys the additional
-implicitly defined field names. It is hence a requirement that the
-set of keys be disjoint from all other field names (the values of
-~"config_fields"~, ~"string_fields"~, and ~"target_fields"~, as well as
-the keys of the ~"implict"~ parameter). Another consequence is that
-~"config_transitions"~ map may now also have meaningful entries for
-the keys of the ~"anonymous"~ map. Each value in the map has to be
-itself a map, with entries ~"target"~, ~"provider"~, and ~"rule_map"~.
-
-For ~"target"~, a single string has to be specifed, and the value has
-to be a member of the ~"target_fields"~ list. For provider, a single
-string has to be specified as well. The idea is that the nodes are
-collected from that provider of the targets in the specified target
-field. For ~"rule_map"~ a map has to be specified from strings to
-rule names; the latter are evaluated in the context of the rule
-definition.
-
-****** Example
-
-For generating language bindings for protocol buffers, a rule might
-look as follows.
-
-#+BEGIN_SRC
-{ "cc_proto_bindings":
- { "target_fields": ["proto_deps"]
- , "anonymous":
- { "protos":
- { "target": "proto_deps"
- , "provider": "proto"
- , "rule_map": {"proto_library": "cc_proto_library"}
- }
- }
- , "expression": {...}
- }
-}
-#+END_SRC
-
-***** Evaluation mechanism
-
-The evaluation of a target defined by a user-defined rule is handled
-as follows. After the target fields are evaluated as usual, an
-additional step is carried out.
-
-For each anonymous-target field, i.e., for each key in the ~"anonymous"~
-map, a list of anonymous targets is generated from the corresponding
-value: take all targets from the specified ~"target"~ field in all
-their specified configuration transitions (they have already been
-evaluated) and take the values provided for the specified ~"provider"~
-key (using the empty list as default). That value has to be a list
-of nodes. All the node lists obtained that way are concatenated.
-The configuration transition for the respective field name is
-evaluated. Those targets are then evaluated for all the transitioned
-configurations requested.
-
-In the final evaluation of the defining expression, the anonymous-target
-fields are available in the same way as any other target field.
-Also, they contribute to the effective configuration in the same
-way as regular target fields.
diff --git a/doc/concepts/built-in-rules.md b/doc/concepts/built-in-rules.md
new file mode 100644
index 00000000..3672df36
--- /dev/null
+++ b/doc/concepts/built-in-rules.md
@@ -0,0 +1,172 @@
+Built-in rules
+==============
+
+Targets are defined in `TARGETS` files. Each target file is a single
+`JSON` object. If the target name is contained as a key in that object,
+the corresponding value defines the target; otherwise it is implicitly
+considered a source file. The target definition itself is a `JSON`
+object as well. The mandatory key `"type"` specifies the rule defining
+the target; the meaning of the remaining keys depends on the rule
+defining the target.
+
+There are a couple of rules built in, all named by a single string. The
+user can define additional rules (and, in fact, we expect the majority
+of targets to be defined by user-defined rules); referring to them in a
+qualified way (with module) will always refer to those even if new
+built-in rules are added later (as built-in rules will always be only
+named by a single string).
+
+The following rules are built in. Built-in rules can have a special
+syntax.
+
+`"export"`
+----------
+
+The `"export"` rule evaluates a given target in a specified
+configuration. More precisely, the field `"target"` has to name a single
+target (not a list of targets), the field `"flexible_config"` a list of
+strings, treated as variable names, and the field `"fixed_config"` has
+to be a map that is taken unevaluated. It is a requirement that the
+domain of the `"fixed_config"` and the `"flexible_config"` be disjoint.
+The optional fields `"doc"` and `"config_doc"` can be used to describe
+the target and the `"flexible_config"`, respectively.
+
+To evaluate an `"export"` target, first the configuration is restricted
+to the `"flexible_config"` and then the union with the `"fixed_config"`
+is built. The target specified in `"target"` is then evaluated. It is a
+requirement that this target be untainted. The result is the result of
+this evaluation; artifacts, runfiles, and provides map are forwarded
+unchanged.
+
+The main point of the `"export"` rule is, that the relevant part of the
+configuration can be determined without having to analyze the target
+itself. This makes such rules eligible for target-level caching
+(provided the content of the repository as well as all reachable ones
+can be determined cheaply). This eligibility is also the reason why it
+is good practice to only depend on `"export"` targets of other
+repositories.
+
+`"install"`
+-----------
+
+The `"install"` rules allows to stage artifacts (and runfiles) of other
+targets in a different way. More precisely, a new stage (i.e., map of
+artifacts with keys treated as file names) is constructed in the
+following way.
+
+The runfiles from all targets in the `"deps"` field are taken; the
+`"deps"` field is an evaluated field and has to evaluate to a list of
+targets. It is an error, if those runfiles conflict.
+
+The `"files"` argument is a special form. It has to be a map, and the
+keys are taken as paths. The values are evaluated and have to evaluate
+to a single target. That target has to have a single artifact or no
+artifacts and a single run file. In this way, `"files"` defines a stage;
+this stage overlays the runfiles of the `"deps"` and conflicts are
+ignored.
+
+Finally, the `"dirs"` argument has to evaluate to a list of pairs (i.e.,
+lists of length two) with the first argument a target name and the
+second argument a string, taken as directory name. For each entry, both,
+runfiles and artifacts of the specified target are staged to the
+specified directory. It is an error if a conflict with the stage
+constructed so far occurs.
+
+Both, runfiles and artifacts of the `"install"` target are the stage
+just described. An `"install"` target always has an empty provides map.
+Any provided information of the dependencies is discarded.
+
+`"generic"`
+-----------
+
+The `"generic"` rules allows to define artifacts as the output of an
+action. This is mainly useful for ad-hoc constructions; for anything
+occurring more often, a proper user-defined rule is usually the better
+choice.
+
+The `"deps"` argument is evaluated and has to evaluate to a list of
+target names. The runfiles and artifacts of these targets form the
+inputs of the action. Conflicts are not an error and resolved by giving
+precedence to the artifacts over the runfiles; conflicts within
+artifacts or runfiles are resolved in a latest-wins fashion using the
+order of the targets in the evaluated `"deps"` argument.
+
+The fields `"cmds"`, `"out_dirs"`, `"outs"`, and `"env"` are evaluated
+fields where `"cmds"`, `"out_dirs"`, and `"outs"` have to evaluate to a
+list of strings, and `"env"` has to evaluate to a map of strings. During
+their evaluation, the functions `"out_dirs"`, `"outs"` and `"runfiles"`
+can be used to access the logical paths of the directories, artifacts
+and runfiles, respectively, of a target specified in `"deps"`. Here,
+`"env"` specifies the environment in which the action is carried out.
+`"out_dirs"` and `"outs"` define the output directories and files,
+respectively, the action has to produce. Since some artifacts are to be
+produced, at least one of `"out_dirs"` or `"outs"` must be a non-empty
+list of strings. It is an error if one or more paths are present in both
+the `"out_dirs"` and `"outs"`. Finally, the strings in `"cmds"` are
+extended by a newline character and joined, and command of the action is
+interpreting this string by `sh`.
+
+The artifacts of this target are the outputs (as declared by
+`"out_dirs"` and `"outs"`) of this action. Runfiles and provider map are
+empty.
+
+`"file_gen"`
+------------
+
+The `"file_gen"` rule allows to specify a file with a given content. To
+be able to accurately report about file names of artifacts or runfiles
+of other targets, they can be specified in the field `"deps"` which has
+to evaluate to a list of targets. The names of the artifacts and
+runfiles of a target specified in `"deps"` can be accessed through the
+functions `"outs"` and `"runfiles"`, respectively, during the evaluation
+of the arguments `"name"` and `"data"` which have to evaluate to a
+single string.
+
+Artifacts and runfiles of a `"file_gen"` target are a singleton map with
+key the result of evaluating `"name"` and value a (non-executable) file
+with content the result of evaluating `"data"`. The provides map is
+empty.
+
+`"tree"`
+--------
+
+The `"tree"` rule allows to specify a tree out of the artifact stage of
+given targets. More precisely, the deps field `"deps"` has to evaluate
+to a list of targets. For each target, runfiles and artifacts are
+overlayed in an artifacts-win fashion and the union of the resulting
+stages is taken; it is an error if conflicts arise in this way. The
+resulting stage is transformed into a tree. Both, artifacts and runfiles
+of the `"tree"` target are a singleton map with the key the result of
+evaluating `"name"` (which has to evaluate to a single string) and value
+that tree.
+
+`"configure"`
+-------------
+
+The `"configure"` rule allows to configure a target with a given
+configuration. The field `"target"` is evaluated and the result of the
+evaluation must name a single target (not a list). The `"config"` field
+is evaluated and must result in a map, which is used as configuration
+for the given target.
+
+This rule uses the given configuration to overlay the current
+environment for evaluating the given target, and thereby performs a
+configuration transition. It forwards all results
+(artifacts/runfiles/provides map) of the configured target to the upper
+context. The result of a target that uses this rule is the result of the
+target given in the `"target"` field (the configured target).
+
+As a full configuration transition is performed, the same care has to be
+taken when using this rule as when writing a configuration transition in
+a rule. Typically, this rule is used only at a top-level target of a
+project and configures only variables internally to the project. In any
+case, when using non-internal targets as dependencies (i.e., targets
+that a caller of the `"configure"` potentially might use as well), care
+should be taken that those are only used in the initial configuration.
+Such preservation of the configuration is necessary to avoid conflicts,
+if the targets depended upon are visible in the `"configure"` target
+itself, e.g., as link dependency (which almost always happens when
+depending on a library). Even if a non-internal target depended upon is
+not visible in the `"configure"` target itself, requesting it in a
+modified configuration causes additional overhead by increasing the
+target graph and potentially the action graph.
diff --git a/doc/concepts/built-in-rules.org b/doc/concepts/built-in-rules.org
deleted file mode 100644
index 9463b10c..00000000
--- a/doc/concepts/built-in-rules.org
+++ /dev/null
@@ -1,167 +0,0 @@
-* Built-in rules
-
-Targets are defined in ~TARGETS~ files. Each target file is a single
-~JSON~ object. If the target name is contained as a key in that
-object, the corresponding value defines the target; otherwise it is
-implicitly considered a source file. The target definition itself
-is a ~JSON~ object as well. The mandatory key ~"type"~ specifies
-the rule defining the target; the meaning of the remaining keys
-depends on the rule defining the target.
-
-There are a couple of rules built in, all named by a single string.
-The user can define additional rules (and, in fact, we expect the
-majority of targets to be defined by user-defined rules); referring
-to them in a qualified way (with module) will always refer to those
-even if new built-in rules are added later (as built-in rules will
-always be only named by a single string).
-
-The following rules are built in. Built-in rules can have a
-special syntax.
-
-** ~"export"~
-
-The ~"export"~ rule evaluates a given target in a specified
-configuration. More precisely, the field ~"target"~ has to name a single
-target (not a list of targets), the field ~"flexible_config"~ a list
-of strings, treated as variable names, and the field ~"fixed_config"~
-has to be a map that is taken unevaluated. It is a requirement that
-the domain of the ~"fixed_config"~ and the ~"flexible_config"~ be
-disjoint. The optional fields ~"doc"~ and ~"config_doc"~ can be used
-to describe the target and the ~"flexible_config"~, respectively.
-
-To evaluate an ~"export"~ target, first the configuration is
-restricted to the ~"flexible_config"~ and then the union with the
-~"fixed_config"~ is built. The target specified in ~"target"~ is
-then evaluated. It is a requirement that this target be untainted.
-The result is the result of this evaluation; artifacts, runfiles,
-and provides map are forwarded unchanged.
-
-The main point of the ~"export"~ rule is, that the relevant part
-of the configuration can be determined without having to analyze
-the target itself. This makes such rules eligible for target-level
-caching (provided the content of the repository as well as all
-reachable ones can be determined cheaply). This eligibility is also
-the reason why it is good practice to only depend on ~"export"~
-targets of other repositories.
-
-** ~"install"~
-
-The ~"install"~ rules allows to stage artifacts (and runfiles) of
-other targets in a different way. More precisely, a new stage (i.e.,
-map of artifacts with keys treated as file names) is constructed
-in the following way.
-
-The runfiles from all targets in the ~"deps"~ field are taken; the
-~"deps"~ field is an evaluated field and has to evaluate to a list
-of targets. It is an error, if those runfiles conflict.
-
-The ~"files"~ argument is a special form. It has to be a map, and
-the keys are taken as paths. The values are evaluated and have
-to evaluate to a single target. That target has to have a single
-artifact or no artifacts and a single run file. In this way, ~"files"~
-defines a stage; this stage overlays the runfiles of the ~"deps"~
-and conflicts are ignored.
-
-Finally, the ~"dirs"~ argument has to evaluate to a list of
-pairs (i.e., lists of length two) with the first argument a target
-name and the second argument a string, taken as directory name. For
-each entry, both, runfiles and artifacts of the specified target
-are staged to the specified directory. It is an error if a conflict
-with the stage constructed so far occurs.
-
-Both, runfiles and artifacts of the ~"install"~ target are the stage
-just described. An ~"install"~ target always has an empty provides
-map. Any provided information of the dependencies is discarded.
-
-** ~"generic"~
-
-The ~"generic"~ rules allows to define artifacts as the output
-of an action. This is mainly useful for ad-hoc constructions; for
-anything occurring more often, a proper user-defined rule is usually
-the better choice.
-
-The ~"deps"~ argument is evaluated and has to evaluate to a list
-of target names. The runfiles and artifacts of these targets form
-the inputs of the action. Conflicts are not an error and resolved
-by giving precedence to the artifacts over the runfiles; conflicts
-within artifacts or runfiles are resolved in a latest-wins fashion
-using the order of the targets in the evaluated ~"deps"~ argument.
-
-The fields ~"cmds"~, ~"out_dirs"~, ~"outs"~, and ~"env"~ are evaluated
-fields where ~"cmds"~, ~"out_dirs"~, and ~"outs"~ have to evaluate to
-a list of strings, and ~"env"~ has to evaluate to a map of
-strings. During their evaluation, the functions ~"out_dirs"~, ~"outs"~
-and ~"runfiles"~ can be used to access the logical paths of the
-directories, artifacts and runfiles, respectively, of a target
-specified in ~"deps"~. Here, ~"env"~ specifies the environment in
-which the action is carried out. ~"out_dirs"~ and ~"outs"~ define the
-output directories and files, respectively, the action has to
-produce. Since some artifacts are to be produced, at least one of
-~"out_dirs"~ or ~"outs"~ must be a non-empty list of strings. It is an
-error if one or more paths are present in both the ~"out_dirs"~ and
-~"outs"~. Finally, the strings in ~"cmds"~ are extended by a newline
-character and joined, and command of the action is interpreting this
-string by ~sh~.
-
-The artifacts of this target are the outputs (as declared by
-~"out_dirs"~ and ~"outs"~) of this action. Runfiles and provider map
-are empty.
-
-** ~"file_gen"~
-
-The ~"file_gen"~ rule allows to specify a file with a given content.
-To be able to accurately report about file names of artifacts
-or runfiles of other targets, they can be specified in the field
-~"deps"~ which has to evaluate to a list of targets. The names
-of the artifacts and runfiles of a target specified in ~"deps"~
-can be accessed through the functions ~"outs"~ and ~"runfiles"~,
-respectively, during the evaluation of the arguments ~"name"~ and
-~"data"~ which have to evaluate to a single string.
-
-Artifacts and runfiles of a ~"file_gen"~ target are a singleton map
-with key the result of evaluating ~"name"~ and value a (non-executable)
-file with content the result of evaluating ~"data"~. The provides
-map is empty.
-
-** ~"tree"~
-
-The ~"tree"~ rule allows to specify a tree out of the artifact
-stage of given targets. More precisely, the deps field ~"deps"~
-has to evaluate to a list of targets. For each target, runfiles
-and artifacts are overlayed in an artifacts-win fashion and
-the union of the resulting stages is taken; it is an error if conflicts
-arise in this way. The resulting stage is transformed into a tree.
-Both, artifacts and runfiles of the ~"tree"~ target are a singleton map
-with the key the result of evaluating ~"name"~ (which has to evaluate to
-a single string) and value that tree.
-
-
-** ~"configure"~
-
-The ~"configure"~ rule allows to configure a target with a given
-configuration. The field ~"target"~ is evaluated and the result
-of the evaluation must name a single target (not a list). The
-~"config"~ field is evaluated and must result in a map, which is
-used as configuration for the given target.
-
-This rule uses the given configuration to overlay the current environment for
-evaluating the given target, and thereby performs a configuration transition. It
-forwards all results (artifacts/runfiles/provides map) of the configured target
-to the upper context. The result of a target that uses this rule is the result
-of the target given in the ~"target"~ field (the configured target).
-
-As a full configuration transition is performed, the same care has
-to be taken when using this rule as when writing a configuration
-transition in a rule. Typically, this rule is used only at a
-top-level target of a project and configures only variables internally
-to the project. In any case, when using non-internal targets as
-dependencies (i.e., targets that a caller of the ~"configure"~
-potentially might use as well), care should be taken that those
-are only used in the initial configuration. Such preservation of
-the configuration is necessary to avoid conflicts, if the targets
-depended upon are visible in the ~"configure"~ target itself, e.g.,
-as link dependency (which almost always happens when depending on a
-library). Even if a non-internal target depended upon is not visible
-in the ~"configure"~ target itself, requesting it in a modified
-configuration causes additional overhead by increasing the target
-graph and potentially the action graph.
diff --git a/doc/concepts/cache-pragma.md b/doc/concepts/cache-pragma.md
new file mode 100644
index 00000000..858f2b4f
--- /dev/null
+++ b/doc/concepts/cache-pragma.md
@@ -0,0 +1,134 @@
+Action caching pragma
+=====================
+
+Introduction: exit code, build failures, and caching
+----------------------------------------------------
+
+The exit code of a process is used to signal success or failure of that
+process. By convention, 0 indicates success and any other value
+indicates some form of failure.
+
+Our tool expects all build actions to follow this convention. A non-zero
+exit code of a regular build action has two consequences.
+
+ - As the action failed, the whole build is aborted and considered
+ failed.
+ - As such a failed action can never be part of a successful build, it
+ is (effectively) not cached.
+
+This non-caching is achieved by rerequesting an action without cache
+look up, if a failed action from cache is reported.
+
+In particular, for building, we have the property that everything that
+does not lead to aborting the build can (and will) be cached. This
+property is justified as we expect build actions to behave in a
+functional way.
+
+Test and run actions
+--------------------
+
+Tests have a lot of similarity to regular build actions: a process is
+run with given inputs, and the results are processed further (e.g., to
+create reports on test suites). However, they break the above described
+connection between caching and continuation of the build: we expect that
+some tests might be flaky (even though they shouldn't be, of course)
+and hence only want to cache successful tests. Nevertheless, we do want
+to continue testing after the first test failure.
+
+Another breakage of the functionality assumption of actions are "run"
+actions, i.e., local actions that are executed either because of their
+side effect on the host system, or because of their non-deterministic
+results (e.g., monitoring some resource). Those actions should never be
+cached, but if they fail, the build should be aborted.
+
+Tainting
+--------
+
+Targets that, directly or indirectly, depend on non-functional actions
+are not regular targets. They are test targets, run targets, benchmark
+results, etc; in any case, they are tainted in some way. When adding
+high-level caching of targets, we will only support caching for
+untainted targets.
+
+To make everybody aware of their special nature, they are clearly marked
+as such: tainted targets not generated by a tainted rule (e.g., a test
+rule) have to explicitly state their taintedness in their attributes.
+This declaration also gives a natural way to mark targets that are
+technically pure, but still should be used only in test, e.g., a mock
+version of a larger library.
+
+Besides being for tests only, there might be other reasons why a target
+might not be fit for general use, e.g., configuration files with
+accounts for developer access, or files under restrictive licences. To
+avoid having to extend the framework for each new use case, we allow
+arbitrary strings as markers for the kind of taintedness of a target. Of
+course, a target can be tainted in more than one way.
+
+More precisely, rules can have `"tainted"` as an additional property.
+Moreover `"tainted"` is another reserved keyword for target arguments
+(like `"type"` and `"arguments_config"`). In both cases, the value has
+to be a list of strings, and the empty list is assumed, if not
+specified.
+
+A rule is tainted with the set of strings in its `"tainted"` property. A
+target is tainted with the union of the set of strings of its
+`"tainted"` argument and the set of strings its generating rule is
+tainted with.
+
+Every target has to be tainted with (at least) the union of what its
+dependencies are tainted with.
+
+For tainted targets, the `analyse`, `build`, and `install` commands
+report the set of strings the target is tainted with.
+
+### `"may_fail"` and `"no_cache"` properties of `"ACTION"`
+
+The `"ACTION"` function in the defining expression of a rule have two
+additional (besides inputs, etc) parameters `"may_fail"` and
+`"no_cache"`. Those are not evaluated and have to be lists of strings
+(with empty assumed if the respective parameter is not present). Only
+strings the defining rule is tainted with may occur in that list. If the
+list is not empty, the corresponding may-fail or no-cache bit of the
+action is set.
+
+For actions with the `"may_fail"` bit set, the optional parameter
+`"fail_message"` with default value `"action failed"` is evaluated. That
+message will be reported if the action returns a non-zero exit value.
+
+Actions with the no-cache bit set are never cached. If an action with
+the may-fail bit set exits with non-zero exit value, the build is
+continued if the action nevertheless managed to produce all expected
+outputs. We continue to ignore actions with non-zero exit status from
+cache.
+
+### Marking of failed artifacts
+
+To simplify finding failures in accumulated reports, our tool keeps
+track of artifacts generated by failed actions. More precisely,
+artifacts are considered failed if one of the following conditions
+applies.
+
+ - Artifacts generated by failed actions are failed.
+ - Tree artifacts containing a failed artifact are failed.
+ - Artifacts generated by an action taking a failed artifact as input
+ are failed.
+
+The identifiers used for built artifacts (including trees) remain
+unchanged; in particular, they will only describe the contents and not
+if they were obtained in a failed way.
+
+When reporting artifacts, e.g., in the log file, an additional marker is
+added to indicate that the artifact is a failed one. After every `build`
+or `install` command, if the requested artifacts contain failed one, a
+different exit code is returned.
+
+### The `install-cas` subcommand
+
+A typical workflow for testing is to first run the full test suite and
+then only look at the failed tests in more details. As we don't take
+failed actions from cache, installing the output can't be done by
+rerunning the same target as `install` instead of `build`. Instead, the
+output has to be taken from CAS using the identifier shown in the build
+log. To simplify this workflow, there is the `install-cas` subcommand
+that installs a CAS entry, identified by the identifier as shown in the
+log to a given location or (if no location is specified) to `stdout`.
diff --git a/doc/concepts/cache-pragma.org b/doc/concepts/cache-pragma.org
deleted file mode 100644
index 11953702..00000000
--- a/doc/concepts/cache-pragma.org
+++ /dev/null
@@ -1,130 +0,0 @@
-* Action caching pragma
-
-** Introduction: exit code, build failures, and caching
-
-The exit code of a process is used to signal success or failure
-of that process. By convention, 0 indicates success and any other
-value indicates some form of failure.
-
-Our tool expects all build actions to follow this convention. A
-non-zero exit code of a regular build action has two consequences.
-- As the action failed, the whole build is aborted and considered failed.
-- As such a failed action can never be part of a successful build,
- it is (effectively) not cached.
-This non-caching is achieved by rerequesting an action without
-cache look up, if a failed action from cache is reported.
-
-In particular, for building, we have the property that everything
-that does not lead to aborting the build can (and will) be cached.
-This property is justified as we expect build actions to behave in
-a functional way.
-
-** Test and run actions
-
-Tests have a lot of similarity to regular build actions: a process is
-run with given inputs, and the results are processed further (e.g.,
-to create reports on test suites). However, they break the above
-described connection between caching and continuation of the
-build: we expect that some tests might be flaky (even though they
-shouldn't be, of course) and hence only want to cache successful
-tests. Nevertheless, we do want to continue testing after the first
-test failure.
-
-Another breakage of the functionality assumption of actions are
-"run" actions, i.e., local actions that are executed either because
-of their side effect on the host system, or because of their
-non-deterministic results (e.g., monitoring some resource). Those
-actions should never be cached, but if they fail, the build should
-be aborted.
-
-** Tainting
-
-Targets that, directly or indirectly, depend on non-functional
-actions are not regular targets. They are test targets, run targets,
-benchmark results, etc; in any case, they are tainted in some way.
-When adding high-level caching of targets, we will only support
-caching for untainted targets.
-
-To make everybody aware of their special nature, they are clearly
-marked as such: tainted targets not generated by a tainted rule (e.g.,
-a test rule) have to explicitly state their taintedness in their
-attributes. This declaration also gives a natural way to mark targets
-that are technically pure, but still should be used only in test,
-e.g., a mock version of a larger library.
-
-Besides being for tests only, there might be other reasons why a
-target might not be fit for general use, e.g., configuration files
-with accounts for developer access, or files under restrictive
-licences. To avoid having to extend the framework for each new
-use case, we allow arbitrary strings as markers for the kind of
-taintedness of a target. Of course, a target can be tainted in more
-than one way.
-
-More precisely, rules can have ~"tainted"~ as an additional
-property. Moreover ~"tainted"~ is another reserved keyword for
-target arguments (like ~"type"~ and ~"arguments_config"~). In both
-cases, the value has to be a list of strings, and the empty list
-is assumed, if not specified.
-
-A rule is tainted with the set of strings in its ~"tainted"~
-property. A target is tainted with the union of the set of strings
-of its ~"tainted"~ argument and the set of strings its generating
-rule is tainted with.
-
-Every target has to be tainted with (at least) the union of what
-its dependencies are tainted with.
-
-For tainted targets, the ~analyse~, ~build~, and ~install~ commands
-report the set of strings the target is tainted with.
-
-*** ~"may_fail"~ and ~"no_cache"~ properties of ~"ACTION"~
-
-The ~"ACTION"~ function in the defining expression of a rule
-have two additional (besides inputs, etc) parameters ~"may_fail"~
-and ~"no_cache"~. Those are not evaluated and have to be lists
-of strings (with empty assumed if the respective parameter is not
-present). Only strings the defining rule is tainted with may occur
-in that list. If the list is not empty, the corresponding may-fail
-or no-cache bit of the action is set.
-
-For actions with the ~"may_fail"~ bit set, the optional parameter
-~"fail_message"~ with default value ~"action failed"~ is evaluated.
-That message will be reported if the action returns a non-zero
-exit value.
-
-Actions with the no-cache bit set are never cached. If an action
-with the may-fail bit set exits with non-zero exit value, the build
-is continued if the action nevertheless managed to produce all
-expected outputs. We continue to ignore actions with non-zero exit
-status from cache.
-
-*** Marking of failed artifacts
-
-To simplify finding failures in accumulated reports, our tool
-keeps track of artifacts generated by failed actions. More
-precisely, artifacts are considered failed if one of the following
-conditions applies.
-- Artifacts generated by failed actions are failed.
-- Tree artifacts containing a failed artifact are failed.
-- Artifacts generated by an action taking a failed artifact as
- input are failed.
-The identifiers used for built artifacts (including trees) remain
-unchanged; in particular, they will only describe the contents and
-not if they were obtained in a failed way.
-
-When reporting artifacts, e.g., in the log file, an additional marker
-is added to indicate that the artifact is a failed one. After every
-~build~ or ~install~ command, if the requested artifacts contain
-failed one, a different exit code is returned.
-
-*** The ~install-cas~ subcommand
-
-A typical workflow for testing is to first run the full test suite
-and then only look at the failed tests in more details. As we don't
-take failed actions from cache, installing the output can't be
-done by rerunning the same target as ~install~ instead of ~build~.
-Instead, the output has to be taken from CAS using the identifier
-shown in the build log. To simplify this workflow, there is the
-~install-cas~ subcommand that installs a CAS entry, identified by
-the identifier as shown in the log to a given location or (if no
-location is specified) to ~stdout~.
diff --git a/doc/concepts/configuration.md b/doc/concepts/configuration.md
new file mode 100644
index 00000000..743ed41e
--- /dev/null
+++ b/doc/concepts/configuration.md
@@ -0,0 +1,115 @@
+Configuration
+=============
+
+Targets describe abstract concepts like "library". Depending on
+requirements, a library might manifest itself in different ways. For
+example,
+
+ - it can be built for various target architectures,
+ - it can have the requirement to produce position-independent code,
+ - it can be a special build for debugging, profiling, etc.
+
+So, a target (like a library described by header files, source files,
+dependencies, etc) has some additional input. As those inputs are
+typically of a global nature (e.g., a profiling build usually wants all
+involved libraries to be built for profiling), this additional input,
+called "configuration" follows the same approach as the `UNIX`
+environment: it is a global collection of key-value pairs and every
+target picks, what it needs.
+
+Top-level configuration
+-----------------------
+
+The configuration is a `JSON` object. The configuration for the target
+requested can be specified on the command line using the `-c` option;
+its argument is a file name and that file is supposed to contain the
+`JSON` object.
+
+Propagation
+-----------
+
+Rules and target definitions have to declare which parts of the
+configuration they want to have access to. The (essentially) full
+configuration, however, is passed on to the dependencies; in this way, a
+target not using a part of the configuration can still depend on it, if
+one of its dependencies does.
+
+### Rules configuration and configuration transitions
+
+As part of the definition of a rule, it specifies a set `"config_vars"`
+of variables. During the evaluation of the rule, the configuration
+restricted to those variables (variables unset in the original
+configuration are set to `null`) is used as environment.
+
+Additionally, the rule can request that certain targets be evaluated in
+a modified configuration by specifying `"config_transitions"`
+accordingly. Typically, this is done when a tool is required during the
+build; then this tool has to be built for the architecture on which the
+build is carried out and not the target architecture. Those tools often
+are `"implicit"` dependencies, i.e., dependencies that every target
+defined by that rule has, without the need to specify it in the target
+definition.
+
+### Target configuration
+
+Additionally (and independently of the configuration-dependency of the
+rule), the target definition itself can depend on the configuration.
+This can happen, if a debug version of a library has additional
+dependencies (e.g., for structured debug logs).
+
+If such a configuration-dependency is needed, the reserved key word
+`"arguments_config"` is used to specify a set of variables (if unset,
+the empty set is assumed; this should be the usual case). The
+environment in which all arguments of the target definition are
+evaluated is the configuration restricted to those variables (again,
+with values unset in the original configuration set to `null`).
+
+For example, a library where the debug version has an additional
+dependency could look as follows.
+
+``` jsonc
+{ "libfoo":
+ { "type": ["@", "rules", "CC", "library"]
+ , "arguments_config": ["DEBUG"]
+ , "name": ["foo"]
+ , "hdrs": ["foo.hpp"]
+ , "srcs": ["foo.cpp"]
+ , "local defines":
+ { "type": "if"
+ , "cond": {"type": "var", "name": "DEBUG"}
+ , "then": ["DEBUG"]
+ }
+ , "deps":
+ { "type": "++"
+ , "$1":
+ [ ["libbar", "libbaz"]
+ , { "type": "if"
+ , "cond": {"type": "var", "name": "DEBUG"}
+ , "then": ["libdebuglog"]
+ }
+ ]
+ }
+ }
+}
+```
+
+Effective configuration
+-----------------------
+
+A target is influenced by the configuration through
+
+ - the configuration dependency of target definition, as specified in
+ `"arguments_config"`,
+ - the configuration dependency of the underlying rule, as specified in
+ the rule's `"config_vars"` field, and
+ - the configuration dependency of target dependencies, not taking into
+ account values explicitly set by a configuration transition.
+
+Restricting the configuration to this collection of variables yields the
+effective configuration for that target-configuration pair. The
+`--dump-targets` option of the `analyse` subcommand allows to inspect
+the effective configurations of all involved targets. Due to
+configuration transitions, a target can be analyzed in more than one
+configuration, e.g., if a library is used both, for a tool needed during
+the build, as well as for the final binary cross-compiled for a
+different target architecture.
diff --git a/doc/concepts/configuration.org b/doc/concepts/configuration.org
deleted file mode 100644
index 4217d22d..00000000
--- a/doc/concepts/configuration.org
+++ /dev/null
@@ -1,107 +0,0 @@
-* Configuration
-
-Targets describe abstract concepts like "library". Depending on
-requirements, a library might manifest itself in different ways.
-For example,
-- it can be built for various target architectures,
-- it can have the requirement to produce position-independent code,
-- it can be a special build for debugging, profiling, etc.
-
-So, a target (like a library described by header files, source files,
-dependencies, etc) has some additional input. As those inputs are
-typically of a global nature (e.g., a profiling build usually wants
-all involved libraries to be built for profiling), this additional
-input, called "configuration" follows the same approach as the
-~UNIX~ environment: it is a global collection of key-value pairs
-and every target picks, what it needs.
-
-** Top-level configuration
-
-The configuration is a ~JSON~ object. The configuration for the
-target requested can be specified on the command line using the
-~-c~ option; its argument is a file name and that file is supposed
-to contain the ~JSON~ object.
-
-** Propagation
-
-Rules and target definitions have to declare which parts of the
-configuration they want to have access to. The (essentially) full
-configuration, however, is passed on to the dependencies; in this way,
-a target not using a part of the configuration can still depend on
-it, if one of its dependencies does.
-
-*** Rules configuration and configuration transitions
-
-As part of the definition of a rule, it specifies a set ~"config_vars"~
-of variables. During the evaluation of the rule, the configuration
-restricted to those variables (variables unset in the original
-configuration are set to ~null~) is used as environment.
-
-Additionally, the rule can request that certain targets be evaluated
-in a modified configuration by specifying ~"config_transitions"~
-accordingly. Typically, this is done when a tool is required during
-the build; then this tool has to be built for the architecture on
-which the build is carried out and not the target architecture. Those
-tools often are ~"implicit"~ dependencies, i.e., dependencies that
-every target defined by that rule has, without the need to specify
-it in the target definition.
-
-*** Target configuration
-
-Additionally (and independently of the configuration-dependency
-of the rule), the target definition itself can depend on the
-configuration. This can happen, if a debug version of a library
-has additional dependencies (e.g., for structured debug logs).
-
-If such a configuration-dependency is needed, the reserved key
-word ~"arguments_config"~ is used to specify a set of variables (if
-unset, the empty set is assumed; this should be the usual case).
-The environment in which all arguments of the target definition are
-evaluated is the configuration restricted to those variables (again,
-with values unset in the original configuration set to ~null~).
-
-For example, a library where the debug version has an additional
-dependency could look as follows.
-#+BEGIN_SRC
-{ "libfoo":
- { "type": ["@", "rules", "CC", "library"]
- , "arguments_config": ["DEBUG"]
- , "name": ["foo"]
- , "hdrs": ["foo.hpp"]
- , "srcs": ["foo.cpp"]
- , "local defines":
- { "type": "if"
- , "cond": {"type": "var", "name": "DEBUG"}
- , "then": ["DEBUG"]
- }
- , "deps":
- { "type": "++"
- , "$1":
- [ ["libbar", "libbaz"]
- , { "type": "if"
- , "cond": {"type": "var", "name": "DEBUG"}
- , "then": ["libdebuglog"]
- }
- ]
- }
- }
-}
-#+END_SRC
-
-** Effective configuration
-
-A target is influenced by the configuration through
-- the configuration dependency of target definition, as specified
- in ~"arguments_config"~,
-- the configuration dependency of the underlying rule, as specified
- in the rule's ~"config_vars"~ field, and
-- the configuration dependency of target dependencies, not taking
- into account values explicitly set by a configuration transition.
-Restricting the configuration to this collection of variables yields
-the effective configuration for that target-configuration pair.
-The ~--dump-targets~ option of the ~analyse~ subcommand allows to
-inspect the effective configurations of all involved targets. Due to
-configuration transitions, a target can be analyzed in more than one
-configuration, e.g., if a library is used both, for a tool needed
-during the build, as well as for the final binary cross-compiled
-for a different target architecture.
diff --git a/doc/concepts/doc-strings.md b/doc/concepts/doc-strings.md
new file mode 100644
index 00000000..a1a156ac
--- /dev/null
+++ b/doc/concepts/doc-strings.md
@@ -0,0 +1,152 @@
+Documentation of build rules, expressions, etc
+==============================================
+
+Build rules can obtain a non-trivial complexity. This is especially true
+if several rules have to exist for slightly different use cases, or if
+the rule supports many different fields. Therefore, documentation of the
+rules (and also expressions for the benefit of rule authors) is
+desirable.
+
+Experience shows that documentation that is not versioned together with
+the code it refers to quickly gets out of date, or lost. Therefore, we
+add documentation directly into the respective definitions.
+
+Multi-line strings in JSON
+--------------------------
+
+In JSON, the newline character is encoded specially and not taken
+literally; also, there is not implicit joining of string literals. So,
+in order to also have documentation readable in the JSON representation
+itself, instead of single strings, we take arrays of strings, with the
+understanding that they describe the strings obtained by joining the
+entries with newline characters.
+
+Documentation is optional
+-------------------------
+
+While documentation is highly recommended, it still remains optional.
+Therefore, when in the following we state that a key is for a list or a
+map, it is always implied that it may be absent; in this case, the empty
+array or the empty map is taken as default, respectively.
+
+Rules
+-----
+
+Each rule is described as a JSON object with a fixed set of keys. So
+having fixed keys for documentation does not cause conflicts. More
+precisely, the keys `doc`, `field doc`, `config_doc`, `artifacts_doc`,
+`runfiles_doc`, and `provides_doc` are reserved for documentation. Here,
+`doc` has to be a list of strings describing the rule in general.
+`field doc` has to be a map from (some of) the field names to an array
+of strings, containing additional information on that particular field.
+`config_doc` has to be a map from (some of) the config variables to an
+array of strings describing the respective variable. `artifacts_doc` is
+an array of strings describing the artifacts produced by the rule.
+`runfiles_doc` is an array of strings describing the runfiles produced
+by this rule. Finally, `provides_doc` is a map describing (some of) the
+providers by that rule; as opposed to fields or config variables there
+is no authoritative list of providers given elsewhere in the rule, so it
+is up to the rule author to give an accurate documentation on the
+provided data.
+
+### Example
+
+``` jsonc
+{ "library":
+ { "doc":
+ [ "A C library"
+ , ""
+ , "Define a library that can be used to be statically linked to a"
+ , "binary. To do so, the target can simply be specified in the deps"
+ , "field of a binary; it can also be a dependency of another library"
+ , "and the information is then propagated to the corresponding binary."
+ ]
+ , "string_fields": ["name"]
+ , "target_fields": ["srcs", "hdrs", "private-hdrs", "deps"]
+ , "field_doc":
+ { "name":
+ ["The base name of the library (i.e., the name without the leading lib)."]
+ , "srcs": ["The source files (i.e., *.c files) of the library."]
+ , "hdrs":
+ [ "The public header files of this library. Targets depending on"
+ , "this library will have access to those header files"
+ ]
+ , "private-hdrs":
+ [ "Additional internal header files that are used when compiling"
+ , "the source files. Targets depending on this library have no access"
+ , "to those header files."
+ ]
+ , "deps":
+ [ "Any other libraries that this library uses. The dependency is"
+ , "also propagated (via the link-deps provider) to any consumers of"
+ , "this target. So only direct dependencies should be declared."
+ ]
+ }
+ , "config_vars": ["CC"]
+ , "config_doc":
+ { "CC":
+ [ "single string. defaulting to \"cc\", specifying the compiler"
+ , "to be used. The compiler is also used to launch the preprocessor."
+ ]
+ }
+ , "artifacts_doc":
+ ["The actual library (libname.a) staged in the specified directory"]
+ , "runfiles_doc": ["The public headers of this library"]
+ , "provides_doc":
+ { "compile-deps":
+ [ "Map of artifacts specifying any additional files that, besides the runfiles,"
+ , "have to be present in compile actions of targets depending on this library"
+ ]
+ , "link-deps":
+ [ "Map of artifacts specifying any additional files that, besides the artifacts,"
+ , "have to be present in a link actions of targets depending on this library"
+ ]
+ , "link-args":
+ [ "List of strings that have to be added to the command line for linking actions"
+ , "in targets depending on this library"
+ ]
+ }
+ , "expression": { ... }
+ }
+}
+```
+
+Expressions
+-----------
+
+Expressions are also described by a JSON object with a fixed set of
+keys. Here we use the keys `doc` and `vars_doc` for documentation, where
+`doc` is an array of strings describing the expression as a whole and
+`vars_doc` is a map from (some of) the `vars` to an array of strings
+describing this variable.
+
+Export targets
+--------------
+
+As export targets play the role of interfaces between repositories, it
+is important that they be documented as well. Again, export targets are
+described as a JSON object with fixed set of keys amd we use the keys
+`doc` and `config_doc` for documentation. Here `doc` is an array of
+strings describing the targeted in general and `config_doc` is a map
+from (some of) the variables of the `flexible_config` to an array of
+strings describing this parameter.
+
+Presentation of the documentation
+---------------------------------
+
+As all documentation are just values (that need not be evaluated) in
+JSON objects, it is easy to write tool rendering documentation pages for
+rules, etc, and we expect those tools to be written independently.
+Nevertheless, for the benefit of developers using rules from a git-tree
+roots that might not be checked out, there is a subcommand `describe`
+which takes a target specification like the `analyze` command, looks up
+the corresponding rule and describes it fully, i.e., prints in
+human-readable form
+
+ - the documentation for the rule
+ - all the fields available for that rule together with
+ - their type (`string_field`, `target_field`, etc), and
+ - their documentation,
+ - all the configuration variables of the rule with their documentation
+ (if given), and
+ - the documented providers.
diff --git a/doc/concepts/doc-strings.org b/doc/concepts/doc-strings.org
deleted file mode 100644
index d9a94dc5..00000000
--- a/doc/concepts/doc-strings.org
+++ /dev/null
@@ -1,145 +0,0 @@
-* Documentation of build rules, expressions, etc
-
-Build rules can obtain a non-trivial complexity. This is especially
-true if several rules have to exist for slightly different use
-cases, or if the rule supports many different fields. Therefore,
-documentation of the rules (and also expressions for the benefit
-of rule authors) is desirable.
-
-Experience shows that documentation that is not versioned together with
-the code it refers to quickly gets out of date, or lost. Therefore,
-we add documentation directly into the respective definitions.
-
-** Multi-line strings in JSON
-
-In JSON, the newline character is encoded specially and not taken
-literally; also, there is not implicit joining of string literals.
-So, in order to also have documentation readable in the JSON
-representation itself, instead of single strings, we take arrays
-of strings, with the understanding that they describe the strings
-obtained by joining the entries with newline characters.
-
-** Documentation is optional
-
-While documentation is highly recommended, it still remains optional.
-Therefore, when in the following we state that a key is for a list
-or a map, it is always implied that it may be absent; in this case,
-the empty array or the empty map is taken as default, respectively.
-
-** Rules
-
-Each rule is described as a JSON object with a fixed set of keys.
-So having fixed keys for documentation does not cause conflicts.
-More precisely, the keys ~doc~, ~field doc~, ~config_doc~,
-~artifacts_doc~, ~runfiles_doc~, and ~provides_doc~
-are reserved for documentation. Here, ~doc~ has to be a list of
-strings describing the rule in general. ~field doc~ has to be a map
-from (some of) the field names to an array of strings, containing
-additional information on that particular field. ~config_doc~ has
-to be a map from (some of) the config variables to an array of
-strings describing the respective variable. ~artifacts_doc~ is
-an array of strings describing the artifacts produced by the rule.
-~runfiles_doc~ is an array of strings describing the runfiles produced
-by this rule. Finally, ~provides_doc~ is a map describing (some
-of) the providers by that rule; as opposed to fields or config
-variables there is no authoritative list of providers given elsewhere
-in the rule, so it is up to the rule author to give an accurate
-documentation on the provided data.
-
-*** Example
-
-#+BEGIN_SRC
-{ "library":
- { "doc":
- [ "A C library"
- , ""
- , "Define a library that can be used to be statically linked to a"
- , "binary. To do so, the target can simply be specified in the deps"
- , "field of a binary; it can also be a dependency of another library"
- , "and the information is then propagated to the corresponding binary."
- ]
- , "string_fields": ["name"]
- , "target_fields": ["srcs", "hdrs", "private-hdrs", "deps"]
- , "field_doc":
- { "name":
- ["The base name of the library (i.e., the name without the leading lib)."]
- , "srcs": ["The source files (i.e., *.c files) of the library."]
- , "hdrs":
- [ "The public header files of this library. Targets depending on"
- , "this library will have access to those header files"
- ]
- , "private-hdrs":
- [ "Additional internal header files that are used when compiling"
- , "the source files. Targets depending on this library have no access"
- , "to those header files."
- ]
- , "deps":
- [ "Any other libraries that this library uses. The dependency is"
- , "also propagated (via the link-deps provider) to any consumers of"
- , "this target. So only direct dependencies should be declared."
- ]
- }
- , "config_vars": ["CC"]
- , "config_doc":
- { "CC":
- [ "single string. defaulting to \"cc\", specifying the compiler"
- , "to be used. The compiler is also used to launch the preprocessor."
- ]
- }
- , "artifacts_doc":
- ["The actual library (libname.a) staged in the specified directory"]
- , "runfiles_doc": ["The public headers of this library"]
- , "provides_doc":
- { "compile-deps":
- [ "Map of artifacts specifying any additional files that, besides the runfiles,"
- , "have to be present in compile actions of targets depending on this library"
- ]
- , "link-deps":
- [ "Map of artifacts specifying any additional files that, besides the artifacts,"
- , "have to be present in a link actions of targets depending on this library"
- ]
- , "link-args":
- [ "List of strings that have to be added to the command line for linking actions"
- , "in targets depending on this library"
- ]
- }
- , "expression": { ... }
- }
-}
-#+END_SRC
-
-** Expressions
-
-Expressions are also described by a JSON object with a fixed set of
-keys. Here we use the keys ~doc~ and ~vars_doc~ for documentation,
-where ~doc~ is an array of strings describing the expression as a
-whole and ~vars_doc~ is a map from (some of) the ~vars~ to an array
-of strings describing this variable.
-
-** Export targets
-
-As export targets play the role of interfaces between repositories,
-it is important that they be documented as well. Again, export targets
-are described as a JSON object with fixed set of keys amd we use
-the keys ~doc~ and ~config_doc~ for documentation. Here ~doc~ is an
-array of strings describing the targeted in general and ~config_doc~
-is a map from (some of) the variables of the ~flexible_config~ to
-an array of strings describing this parameter.
-
-** Presentation of the documentation
-
-As all documentation are just values (that need not be evaluated)
-in JSON objects, it is easy to write tool rendering documentation
-pages for rules, etc, and we expect those tools to be written
-independently. Nevertheless, for the benefit of developers using
-rules from a git-tree roots that might not be checked out, there is
-a subcommand ~describe~ which takes a target specification like the
-~analyze~ command, looks up the corresponding rule and describes
-it fully, i.e., prints in human-readable form
-- the documentation for the rule
-- all the fields available for that rule together with
- - their type (~string_field~, ~target_field~, etc), and
- - their documentation,
-- all the configuration variables of the rule with their
- documentation (if given), and
-- the documented providers.
diff --git a/doc/concepts/expressions.md b/doc/concepts/expressions.md
new file mode 100644
index 00000000..9e8a8f36
--- /dev/null
+++ b/doc/concepts/expressions.md
@@ -0,0 +1,368 @@
+Expression language
+===================
+
+At various places, in particular in order to define a rule, we need a
+restricted form of functional computation. This is achieved by our
+expression language.
+
+Syntax
+------
+
+All expressions are given by JSON values. One can think of expressions
+as abstract syntax trees serialized to JSON; nevertheless, the precise
+semantics is given by the evaluation mechanism described later.
+
+Semantic Values
+---------------
+
+Expressions evaluate to semantic values. Semantic values are JSON values
+extended by additional atomic values for build-internal values like
+artifacts, names, etc.
+
+### Truth
+
+Every value can be treated as a boolean condition. We follow a
+convention similar to `LISP` considering everything true that is not
+empty. More precisely, the values
+
+ - `null`,
+ - `false`,
+ - `0`,
+ - `""`,
+ - the empty map, and
+ - the empty list
+
+are considered logically false. All other values are logically true.
+
+Evaluation
+----------
+
+The evaluation follows a strict, functional, call-by-value evaluation
+mechanism; the precise evaluation is as follows.
+
+ - Atomic values (`null`, booleans, strings, numbers) evaluate to
+ themselves.
+ - For lists, each entry is evaluated in the order they occur in the
+ list; the result of the evaluation is the list of the results.
+ - For JSON objects (wich can be understood as maps, or dicts), the key
+ `"type"` has to be present and has to be a literal string. That
+ string determines the syntactical construct (sloppily also referred
+ to as "function") the object represents, and the remaining
+ evaluation depends on the syntactical construct. The syntactical
+ construct has to be either one of the built-in ones or a special
+ function available in the given context (e.g., `"ACTION"` within the
+ expression defining a rule).
+
+All evaluation happens in an "environment" which is a map from strings
+to semantic values.
+
+### Built-in syntactical constructs
+
+#### Special forms
+
+##### Variables: `"var"`
+
+There has to be a key `"name"` that (i.e., the expression in the
+object at that key) has to be a literal string, taken as
+variable name. If the variable name is in the domain of the
+environment and the value of the environment at the variable
+name is non-`null`, then the result of the evaluation is the
+value of the variable in the environment.
+
+Otherwise, the key `"default"` is taken (if present, otherwise
+the value `null` is taken as default for `"default"`) and
+evaluated. The value obtained this way is the result of the
+evaluation.
+
+##### Sequential binding: `"let*"`
+
+The key `"bindings"` (default `[]`) has to be (syntactically) a
+list of pairs (i.e., lists of length two) with the first
+component a literal string.
+
+For each pair in `"bindings"` the second component is evaluated,
+in the order the pairs occur. After each evaluation, a new
+environment is taken for the subsequent evaluations; the new
+environment is like the old one but amended at the position
+given by the first component of the pair to now map to the value
+just obtained.
+
+Finally, the `"body"` is evaluated in the final environment
+(after evaluating all binding entries) and the result of
+evaluating the `"body"` is the value for the whole `"let*"`
+expression.
+
+##### Environment Map: `"env"`
+
+Creates a map from selected environment variables.
+
+The key `"vars"` (default `[]`) has to be a list of literal
+strings referring to the variable names that should be included
+in the produced map. This field is not evaluated. This
+expression is only for convenience and does not give new
+expression power. It is equivalent but lot shorter to multiple
+`singleton_map` expressions combined with `map_union`.
+
+##### Conditionals
+
+###### Binary conditional: `"if"`
+
+First the key `"cond"` is evaluated. If it evaluates to a
+value that is logically true, then the key `"then"` is
+evaluated and its value is the result of the evaluation.
+Otherwise, the key `"else"` (if present, otherwise `[]` is
+taken as default) is evaluated and the obtained value is the
+result of the evaluation.
+
+###### Sequential conditional: `"cond"`
+
+The key `"cond"` has to be a list of pairs. In the order of
+the list, the first components of the pairs are evaluated,
+until one evaluates to a value that is logically true. For
+that pair, the second component is evaluated and the result
+of this evaluation is the result of the `"cond"` expression.
+
+If all first components evaluate to a value that is
+logically false, the result of the expression is the result
+of evaluating the key `"default"` (defaulting to `[]`).
+
+###### String case distinction: `"case"`
+
+If the key `"case"` is present, it has to be a map (an
+"object", in JSON's terminology). In this case, the key
+`"expr"` is evaluated; it has to evaluate to a string. If
+the value is a key in the `"case"` map, the expression at
+this key is evaluated and the result of that evaluation is
+the value for the `"case"` expression.
+
+Otherwise (i.e., if `"case"` is absent or `"expr"` evaluates
+to a string that is not a key in `"case"`), the key
+`"default"` (with default `[]`) is evaluated and this gives
+the result of the `"case"` expression.
+
+###### Sequential case distinction on arbitrary values: `"case*"`
+
+If the key `"case"` is present, it has to be a list of
+pairs. In this case, the key `"expr"` is evaluated. It is an
+error if that evaluates to a name-containing value. The
+result of that evaluation is sequentially compared to the
+evaluation of the first components of the `"case"` list
+until an equal value is found. In this case, the evaluation
+of the second component of the pair is the value of the
+`"case*"` expression.
+
+If the `"case"` key is absent, or no equality is found, the
+result of the `"case*"` expression is the result of
+evaluating the `"default"` key (with default `[]`).
+
+##### Conjunction and disjunction: `"and"` and `"or"`
+
+For conjunction, if the key `"$1"` (with default `[]`) is
+syntactically a list, its entries are sequentially evaluated
+until a logically false value is found; in that case, the result
+is `false`, otherwise true. If the key `"$1"` has a different
+shape, it is evaluated and has to evaluate to a list. The result
+is the conjunction of the logical values of the entries. In
+particular, `{"type": "and"}` evaluates to `true`.
+
+For disjunction, the evaluation mechanism is the same, but the
+truth values and connective are taken dually. So, `"and"` and
+`"or"` are logical conjunction and disjunction, respectively,
+using short-cut evaluation if syntactically admissible (i.e., if
+the argument is syntactically a list).
+
+##### Mapping
+
+###### Mapping over lists: `"foreach"`
+
+First the key `"range"` is evaluated and has to evaluate to
+a list. For each entry of this list, the expression `"body"`
+is evaluated in an environment that is obtained from the
+original one by setting the value for the variable specified
+at the key `"var"` (which has to be a literal string,
+default `"_"`) to that value. The result is the list of
+those evaluation results.
+
+###### Mapping over maps: `"foreach_map"`
+
+Here, `"range"` has to evaluate to a map. For each entry (in
+lexicographic order (according to native byte order) by
+keys), the expression `"body"` is evaluated in an
+environment obtained from the original one by setting the
+variables specified at `"var_key"` and `"var_val"` (literal
+strings, default values `"_"` and `"$_"`, respectively). The
+result of the evaluation is the list of those values.
+
+##### Folding: `"foldl"`
+
+The key `"range"` is evaluated and has to evaluate to a list.
+Starting from the result of evaluating `"start"` (default `[]`)
+a new value is obtained for each entry of the range list by
+evaluating `"body"` in an environment obtained from the original
+by binding the variable specified by `"var"` (literal string,
+default `"_"`) to the list entry and the variable specified by
+`"accum_var"` (literal string, default value `"$1"`) to the old
+value. The result is the last value obtained.
+
+#### Regular functions
+
+First `"$1"` is evaluated; for binary functions `"$2"` is evaluated
+next. For functions that accept keyword arguments, those are
+evaluated as well. Finally the function is applied to this (or
+those) argument(s) to obtain the final result.
+
+##### Unary functions
+
+ - `"nub_right"` The argument has to be a list. It is an error
+ if that list contains (directly or indirectly) a name. The
+ result is the input list, except that for all duplicate
+ values, all but the rightmost occurrence is removed.
+
+ - `"basename"` The argument has to be a string. This string is
+ interpreted as a path, and the file name thereof is
+ returned.
+
+ - `"keys"` The argument has to be a map. The result is the
+ list of keys of this map, in lexicographical order
+ (according to native byte order).
+
+ - `"values"` The argument has to be a map. The result are the
+ values of that map, ordered by the corresponding keys
+ (lexicographically according to native byte order).
+
+ - `"range"` The argument is interpreted as a non-negative
+ integer as follows. Non-negative numbers are rounded to the
+ nearest integer; strings have to be the decimal
+ representation of an integer; everything else is considered
+ zero. The result is a list of the given length, consisting
+ of the decimal representations of the first non-negative
+ integers. For example, `{"type": "range",
+ "$1": "3"}` evaluates to `["0", "1", "2"]`.
+
+ - `"enumerate"` The argument has to be a list. The result is a
+ map containing one entry for each element of the list. The
+ key is the decimal representation of the position in the
+ list (starting from `0`), padded with leading zeros to
+ length at least 10. The value is the element. The padding is
+ chosen in such a way that iterating over the resulting map
+ (which happens in lexicographic order of the keys) has the
+ same iteration order as the list for all lists indexable by
+ 32-bit integers.
+
+ - `"++"` The argument has to be a list of lists. The result is
+ the concatenation of those lists.
+
+ - `"map_union"` The argument has to be a list of maps. The
+ result is a map containing as keys the union of the keys of
+ the maps in that list. For each key, the value is the value
+ of that key in the last map in the list that contains that
+ key.
+
+ - `"join_cmd"` The argument has to be a list of strings. A
+ single string is returned that quotes the original vector in
+ a way understandable by a POSIX shell. As the command for an
+ action is directly given by an argument vector, `"join_cmd"`
+ is typically only used for generated scripts.
+
+ - `"json_encode"` The result is a single string that is the
+ canonical JSON encoding of the argument (with minimal white
+ space); all atomic values that are not part of JSON (i.e.,
+ the added atomic values to represent build-internal values)
+ are serialized as `null`.
+
+##### Unary functions with keyword arguments
+
+ - `"change_ending"` The argument has to be a string,
+ interpreted as path. The ending is replaced by the value of
+ the keyword argument `"ending"` (a string, default `""`).
+ For example, `{"type":
+ "change_ending", "$1": "foo/bar.c", "ending": ".o"}`
+ evaluates to `"foo/bar.o"`.
+
+ - `"join"` The argument has to be a list of strings. The
+ return value is the concatenation of those strings,
+ separated by the the specified `"separator"` (strings,
+ default `""`).
+
+ - `"escape_chars"` Prefix every in the argument every
+ character occuring in `"chars"` (a string, default `""`) by
+ `"escape_prefix"` (a strings, default `"\"`).
+
+ - `"to_subdir"` The argument has to be a map (not necessarily
+ of artifacts). The keys as well as the `"subdir"` (string,
+ default `"."`) argument are interpreted as paths and keys
+ are replaced by the path concatenation of those two paths.
+ If the optional argument `"flat"` (default `false`)
+ evaluates to a true value, the keys are instead replaced by
+ the path concatenation of the `"subdir"` argument and the
+ base name of the old key. It is an error if conflicts occur
+ in this way; in case of such a user error, the argument
+ `"msg"` is also evaluated and the result of that evaluation
+ reported in the error message. Note that conflicts can also
+ occur in non-flat staging if two keys are different as
+ strings, but name the same path (like `"foo.txt"` and
+ `"./foo.txt"`), and are assigned different values. It also
+ is an error if the values for keys in conflicting positions
+ are name-containing.
+
+##### Binary functions
+
+ - `"=="` The result is `true` is the arguments are equal,
+ `false` otherwise. It is an error if one of the arguments
+ are name-containing values.
+
+ - `"concat_target_name"` This function is only present to
+ simplify transitions from some other build systems and
+ normally not used outside code generated by transition
+ tools. The second argument has to be a string or a list of
+ strings (in the latter case, it is treated as strings by
+ concatenating the entries). If the first argument is a
+ string, the result is the concatenation of those two
+ strings. If the first argument is a list of strings, the
+ result is that list with the second argument concatenated to
+ the last entry of that list (if any).
+
+##### Other functions
+
+ - `"empty_map"` This function takes no arguments and always
+ returns an empty map.
+
+ - `"singleton_map"` This function takes two keyword arguments,
+ `"key"` and `"value"` and returns a map with one entry,
+ mapping the given key to the given value.
+
+ - `"lookup"` This function takes two keyword arguments,
+ `"key"` and `"map"`. The `"key"` argument has to evaluate to
+ a string and the `"map"` argument has to evaluate to a map.
+ If that map contains the given key and the corresponding
+ value is non-`null`, the value is returned. Otherwise the
+ `"default"` argument (with default `null`) is evaluated and
+ returned.
+
+#### Constructs related to reporting of user errors
+
+Normally, if an error occurs during the evaluation the error is
+reported together with a stack trace. This, however, might not be
+the most informative way to present a problem to the user,
+especially if the underlying problem is a proper user error, e.g.,
+in rule usage (leaving out mandatory arguments, violating semantical
+prerequisites, etc). To allow proper error reporting, the following
+functions are available. All of them have an optional argument
+`"msg"` that is evaluated (only) in case of error and the result of
+that evaluation included in the error message presented to the user.
+
+ - `"fail"` Evaluation of this function unconditionally fails.
+
+ - `"context"` This function is only there to provide additional
+ information in case of error. Otherwise it is the identify
+ function (a unary function, i.e., the result of the evaluation
+ is the result of evaluating the argument `"$1"`).
+
+ - `"assert_non_empty"` Evaluate the argument (given by the
+ parameter `"$1"`). If it evaluates to a non-empty string, map,
+ or list, return the result of the evaluation. Otherwise fail.
+
+ - `"disjoint_map_union"` Like `"map_union"` but it is an error, if
+ two (or more) maps contain the same key, but map it to different
+ values. It is also an error if the argument is a name-containing
+ value.
diff --git a/doc/concepts/expressions.org b/doc/concepts/expressions.org
deleted file mode 100644
index ac66e878..00000000
--- a/doc/concepts/expressions.org
+++ /dev/null
@@ -1,344 +0,0 @@
-* Expression language
-
-At various places, in particular in order to define a rule, we need
-a restricted form of functional computation. This is achieved by
-our expression language.
-
-** Syntax
-
-All expressions are given by JSON values. One can think of expressions
-as abstract syntax trees serialized to JSON; nevertheless, the precise
-semantics is given by the evaluation mechanism described later.
-
-** Semantic Values
-
-Expressions evaluate to semantic values. Semantic values are JSON
-values extended by additional atomic values for build-internal
-values like artifacts, names, etc.
-
-*** Truth
-
-Every value can be treated as a boolean condition. We follow a
-convention similar to ~LISP~ considering everything true that is
-not empty. More precisely, the values
-- ~null~,
-- ~false~,
-- ~0~,
-- ~""~,
-- the empty map, and
-- the empty list
-are considered logically false. All other values are logically true.
-
-** Evaluation
-
-The evaluation follows a strict, functional, call-by-value evaluation
-mechanism; the precise evaluation is as follows.
-
-- Atomic values (~null~, booleans, strings, numbers) evaluate to
- themselves.
-- For lists, each entry is evaluated in the order they occur in the
- list; the result of the evaluation is the list of the results.
-- For JSON objects (wich can be understood as maps, or dicts), the
- key ~"type"~ has to be present and has to be a literal string.
- That string determines the syntactical construct (sloppily also
- referred to as "function") the object represents, and the remaining
- evaluation depends on the syntactical construct. The syntactical
- construct has to be either one of the built-in ones or a special
- function available in the given context (e.g., ~"ACTION"~ within
- the expression defining a rule).
-
-All evaluation happens in an "environment" which is a map from
-strings to semantic values.
-
-*** Built-in syntactical constructs
-
-**** Special forms
-
-***** Variables: ~"var"~
-
-There has to be a key ~"name"~ that (i.e., the expression in the
-object at that key) has to be a literal string, taken as variable
-name. If the variable name is in the domain of the environment and
-the value of the environment at the variable name is non-~null~,
-then the result of the evaluation is the value of the variable in
-the environment.
-
-Otherwise, the key ~"default"~ is taken (if present, otherwise the
-value ~null~ is taken as default for ~"default"~) and evaluated.
-The value obtained this way is the result of the evaluation.
-
-***** Sequential binding: ~"let*"~
-
-The key ~"bindings"~ (default ~[]~) has to be (syntactically) a
-list of pairs (i.e., lists of length two) with the first component
-a literal string.
-
-For each pair in ~"bindings"~ the second component is evaluated, in
-the order the pairs occur. After each evaluation, a new environment
-is taken for the subsequent evaluations; the new environment is
-like the old one but amended at the position given by the first
-component of the pair to now map to the value just obtained.
-
-Finally, the ~"body"~ is evaluated in the final environment (after
-evaluating all binding entries) and the result of evaluating the
-~"body"~ is the value for the whole ~"let*"~ expression.
-
-***** Environment Map: ~"env"~
-
-Creates a map from selected environment variables.
-
-The key ~"vars"~ (default ~[]~) has to be a list of literal strings referring to
-the variable names that should be included in the produced map. This field is
-not evaluated. This expression is only for convenience and does not give new
-expression power. It is equivalent but lot shorter to multiple ~singleton_map~
-expressions combined with ~map_union~.
-
-***** Conditionals
-
-****** Binary conditional: ~"if"~
-
-First the key ~"cond"~ is evaluated. If it evaluates to a value that
-is logically true, then the key ~"then"~ is evaluated and its value
-is the result of the evaluation. Otherwise, the key ~"else"~ (if
-present, otherwise ~[]~ is taken as default) is evaluated and the
-obtained value is the result of the evaluation.
-
-****** Sequential conditional: ~"cond"~
-
-The key ~"cond"~ has to be a list of pairs. In the order of the
-list, the first components of the pairs are evaluated, until one
-evaluates to a value that is logically true. For that pair, the
-second component is evaluated and the result of this evaluation is
-the result of the ~"cond"~ expression.
-
-If all first components evaluate to a value that is logically false,
-the result of the expression is the result of evaluating the key
-~"default"~ (defaulting to ~[]~).
-
-****** String case distinction: ~"case"~
-
-If the key ~"case"~ is present, it has to be a map (an "object", in
-JSON's terminology). In this case, the key ~"expr"~ is evaluated; it
-has to evaluate to a string. If the value is a key in the ~"case"~
-map, the expression at this key is evaluated and the result of that
-evaluation is the value for the ~"case"~ expression.
-
-Otherwise (i.e., if ~"case"~ is absent or ~"expr"~ evaluates to a
-string that is not a key in ~"case"~), the key ~"default"~ (with
-default ~[]~) is evaluated and this gives the result of the ~"case"~
-expression.
-
-****** Sequential case distinction on arbitrary values: ~"case*"~
-
-If the key ~"case"~ is present, it has to be a list of pairs. In this
-case, the key ~"expr"~ is evaluated. It is an error if that evaluates
-to a name-containing value. The result of that evaluation
-is sequentially compared to the evaluation of the first components
-of the ~"case"~ list until an equal value is found. In this case,
-the evaluation of the second component of the pair is the value of
-the ~"case*"~ expression.
-
-If the ~"case"~ key is absent, or no equality is found, the result of
-the ~"case*"~ expression is the result of evaluating the ~"default"~
-key (with default ~[]~).
-
-***** Conjunction and disjunction: ~"and"~ and ~"or"~
-
-For conjunction, if the key ~"$1"~ (with default ~[]~) is syntactically
-a list, its entries are sequentially evaluated until a logically
-false value is found; in that case, the result is ~false~, otherwise
-true. If the key ~"$1"~ has a different shape, it is evaluated and
-has to evaluate to a list. The result is the conjunction of the
-logical values of the entries. In particular, ~{"type": "and"}~
-evaluates to ~true~.
-
-For disjunction, the evaluation mechanism is the same, but the truth
-values and connective are taken dually. So, ~"and"~ and ~"or"~ are
-logical conjunction and disjunction, respectively, using short-cut
-evaluation if syntactically admissible (i.e., if the argument is
-syntactically a list).
-
-***** Mapping
-
-****** Mapping over lists: ~"foreach"~
-
-First the key ~"range"~ is evaluated and has to evaluate to a list.
-For each entry of this list, the expression ~"body"~ is evaluated
-in an environment that is obtained from the original one by setting
-the value for the variable specified at the key ~"var"~ (which has
-to be a literal string, default ~"_"~) to that value. The result
-is the list of those evaluation results.
-
-****** Mapping over maps: ~"foreach_map"~
-
-Here, ~"range"~ has to evaluate to a map. For each entry (in
-lexicographic order (according to native byte order) by keys), the
-expression ~"body"~ is evaluated in an environment obtained from
-the original one by setting the variables specified at ~"var_key"~
-and ~"var_val"~ (literal strings, default values ~"_"~ and
-~"$_"~, respectively). The result of the evaluation is the list of
-those values.
-
-***** Folding: ~"foldl"~
-
-The key ~"range"~ is evaluated and has to evaluate to a list.
-Starting from the result of evaluating ~"start"~ (default ~[]~) a
-new value is obtained for each entry of the range list by evaluating
-~"body"~ in an environment obtained from the original by binding
-the variable specified by ~"var"~ (literal string, default ~"_"~) to
-the list entry and the variable specified by ~"accum_var"~ (literal
-string, default value ~"$1"~) to the old value. The result is the
-last value obtained.
-
-**** Regular functions
-
-First ~"$1"~ is evaluated; for binary functions ~"$2"~ is evaluated
-next. For functions that accept keyword arguments, those are
-evaluated as well. Finally the function is applied to this (or
-those) argument(s) to obtain the final result.
-
-***** Unary functions
-
-- ~"nub_right"~ The argument has to be a list. It is an error if that list
- contains (directly or indirectly) a name. The result is the
- input list, except that for all duplicate values, all but the
- rightmost occurrence is removed.
-
-- ~"basename"~ The argument has to be a string. This string is
- interpreted as a path, and the file name thereof is returned.
-
-- ~"keys"~ The argument has to be a map. The result is the list of
- keys of this map, in lexicographical order (according to native
- byte order).
-
-- ~"values"~ The argument has to be a map. The result are the values
- of that map, ordered by the corresponding keys (lexicographically
- according to native byte order).
-
-- ~"range"~ The argument is interpreted as a non-negative integer as
- follows. Non-negative numbers are rounded to the nearest integer;
- strings have to be the decimal representation of an integer;
- everything else is considered zero. The result is a list of the
- given length, consisting of the decimal representations of the
- first non-negative integers. For example, ~{"type": "range",
- "$1": "3"}~ evaluates to ~["0", "1", "2"]~.
-
-- ~"enumerate"~ The argument has to be a list. The result is a map
- containing one entry for each element of the list. The key is
- the decimal representation of the position in the list (starting
- from ~0~), padded with leading zeros to length at least 10. The
- value is the element. The padding is chosen in such a way that
- iterating over the resulting map (which happens in lexicographic
- order of the keys) has the same iteration order as the list for
- all lists indexable by 32-bit integers.
-
-- ~"++"~ The argument has to be a list of lists. The result is the
- concatenation of those lists.
-
-- ~"map_union"~ The argument has to be a list of maps. The result
- is a map containing as keys the union of the keys of the maps in
- that list. For each key, the value is the value of that key in
- the last map in the list that contains that key.
-
-- ~"join_cmd"~ The argument has to be a list of strings. A single
- string is returned that quotes the original vector in a way
- understandable by a POSIX shell. As the command for an action is
- directly given by an argument vector, ~"join_cmd"~ is typically
- only used for generated scripts.
-
-- ~"json_encode"~ The result is a single string that is the canonical
- JSON encoding of the argument (with minimal white space); all atomic
- values that are not part of JSON (i.e., the added atomic values
- to represent build-internal values) are serialized as ~null~.
-
-***** Unary functions with keyword arguments
-
-- ~"change_ending"~ The argument has to be a string, interpreted as
- path. The ending is replaced by the value of the keyword argument
- ~"ending"~ (a string, default ~""~). For example, ~{"type":
- "change_ending", "$1": "foo/bar.c", "ending": ".o"}~ evaluates
- to ~"foo/bar.o"~.
-
-- ~"join"~ The argument has to be a list of strings. The return
- value is the concatenation of those strings, separated by the
- the specified ~"separator"~ (strings, default ~""~).
-
-- ~"escape_chars"~ Prefix every in the argument every character
- occuring in ~"chars"~ (a string, default ~""~) by ~"escape_prefix"~ (a
- strings, default ~"\\"~).
-
-- ~"to_subdir"~ The argument has to be a map (not necessarily of
- artifacts). The keys as well as the ~"subdir"~ (string, default
- ~"."~) argument are interpreted as paths and keys are replaced
- by the path concatenation of those two paths. If the optional
- argument ~"flat"~ (default ~false~) evaluates to a true value,
- the keys are instead replaced by the path concatenation of the
- ~"subdir"~ argument and the base name of the old key. It is an
- error if conflicts occur in this way; in case of such a user
- error, the argument ~"msg"~ is also evaluated and the result
- of that evaluation reported in the error message. Note that
- conflicts can also occur in non-flat staging if two keys are
- different as strings, but name the same path (like ~"foo.txt"~
- and ~"./foo.txt"~), and are assigned different values.
- It also is an error if the values for keys in conflicting positions
- are name-containing.
-
-***** Binary functions
-
-- ~"=="~ The result is ~true~ is the arguments are equal, ~false~
- otherwise. It is an error if one of the arguments are name-containing
- values.
-
-- ~"concat_target_name"~ This function is only present to simplify
- transitions from some other build systems and normally not used
- outside code generated by transition tools. The second argument
- has to be a string or a list of strings (in the latter case,
- it is treated as strings by concatenating the entries). If the
- first argument is a string, the result is the concatenation of
- those two strings. If the first argument is a list of strings,
- the result is that list with the second argument concatenated to
- the last entry of that list (if any).
-
-***** Other functions
-
-- ~"empty_map"~ This function takes no arguments and always returns
- an empty map.
-
-- ~"singleton_map"~ This function takes two keyword arguments,
- ~"key"~ and ~"value"~ and returns a map with one entry, mapping
- the given key to the given value.
-
-- ~"lookup"~ This function takes two keyword arguments, ~"key"~
- and ~"map"~. The ~"key"~ argument has to evaluate to a string
- and the ~"map"~ argument has to evaluate to a map. If that map
- contains the given key and the corresponding value is non-~null~,
- the value is returned. Otherwise the ~"default"~ argument (with
- default ~null~) is evaluated and returned.
-
-**** Constructs related to reporting of user errors
-
-Normally, if an error occurs during the evaluation the error is
-reported together with a stack trace. This, however, might not
-be the most informative way to present a problem to the user,
-especially if the underlying problem is a proper user error, e.g.,
-in rule usage (leaving out mandatory arguments, violating semantical
-prerequisites, etc). To allow proper error reporting, the following
-functions are available. All of them have an optional argument
-~"msg"~ that is evaluated (only) in case of error and the result of
-that evaluation included in the error message presented to the user.
-
-- ~"fail"~ Evaluation of this function unconditionally fails.
-
-- ~"context"~ This function is only there to provide additional
- information in case of error. Otherwise it is the identify
- function (a unary function, i.e., the result of the evaluation
- is the result of evaluating the argument ~"$1"~).
-
-- ~"assert_non_empty"~ Evaluate the argument (given by the parameter
- ~"$1"~). If it evaluates to a non-empty string, map, or list,
- return the result of the evaluation. Otherwise fail.
-
-- ~"disjoint_map_union"~ Like ~"map_union"~ but it is an error,
- if two (or more) maps contain the same key, but map it to
- different values. It is also an error if the argument is a
- name-containing value.
diff --git a/doc/concepts/garbage.md b/doc/concepts/garbage.md
new file mode 100644
index 00000000..69594b1c
--- /dev/null
+++ b/doc/concepts/garbage.md
@@ -0,0 +1,86 @@
+Garbage Collection
+==================
+
+For every build, for all non-failed actions an entry is created in the
+action cache and the corresponding artifacts are stored in the CAS. So,
+over time, a lot of files accumulate in the local build root. Hence we
+have a way to reclaim disk space while keeping the benefits of having a
+cache. This operation is referred to as garbage collection and usually
+uses the heuristics to keeping what is most recently used. Our approach
+follows this paradigm as well.
+
+Invariants assumed by our build system
+--------------------------------------
+
+Our tool assumes several invariants on the local build root, that we
+need to maintain during garbage collection. Those are the following.
+
+ - If an artifact is referenced in any cache entry (action cache,
+ target-level cache), then the corresponding artifact is in CAS.
+ - If a tree is in CAS, then so are its immediate parts (and hence also
+ all transitive parts).
+
+Generations of cache and CAS
+----------------------------
+
+In order to allow garbage collection while keeping the desired
+invariants, we keep several (currently two) generations of cache and
+CAS. Each generation in itself has to fulfill the invariants. The
+effective cache or CAS is the union of the caches or CASes of all
+generations, respectively. Obviously, then the effective cache and CAS
+fulfill the invariants as well.
+
+The actual `gc` command rotates the generations: the oldest generation
+is be removed and the remaining generations are moved one number up
+(i.e., currently the young generation will simply become the old
+generation), implicitly creating a new, empty, youngest generation. As
+an empty generation fulfills the required invariants, this operation
+preservers the requirement that each generation individually fulfill the
+invariants.
+
+All additions are made to the youngest generation; in order to keep the
+invariant, relevant entries only present in an older generation are also
+added to the youngest generation first. Moreover, whenever an entry is
+referenced in any way (cache hit, request for an entry to be in CAS) and
+is only present in an older generation, it is also added to the younger
+generation, again adding referenced parts first. As a consequence, the
+youngest generation contains everything directly or indirectly
+referenced since the last garbage collection; in particular, everything
+referenced since the last garbage collection will remain in the
+effective cache or CAS upon the next garbage collection.
+
+These generations are stored as separate directories inside the local
+build root. As the local build root is, starting from an empty
+directory, entirely managed by \`just\` and compatible tools,
+generations are on the same file system. Therefore the adding of old
+entries to the youngest generation can be implemented in an efficient
+way by using hard links.
+
+The moving up of generations can happen atomically by renaming the
+respective directory. Also, the oldest generation can be removed
+logically by renaming a directory to a name that is not searched for
+when looking for existing generations. The actual recursive removal from
+the file system can then happen in a separate step without any
+requirements on order.
+
+Parallel operations in the presence of garbage collection
+---------------------------------------------------------
+
+The addition to cache and CAS can continue to happen in parallel; that
+certain values are taken from an older generation instead of freshly
+computed does not make a difference for the youngest generation (which
+is the only generation modified). But build processes assume they don't
+violate the invariant if they first add files to CAS and later a tree or
+cache entry referencing them. This, however, only holds true if no
+generation rotation happens in between. To avoid those kind of races, we
+make processes coordinate over a single lock for each build root.
+
+ - Any build process keeps a shared lock for the entirety of the build.
+ - The garbage collection process takes an exclusive lock for the
+ period it does the directory renames.
+
+We consider it acceptable that, in theory, local build processes could
+starve local garbage collection. Moreover, it should be noted that the
+actual removal of no-longer-needed files from the file system happens
+without any lock being held. Hence the disturbance of builds caused by
+garbage collection is small.
diff --git a/doc/concepts/garbage.org b/doc/concepts/garbage.org
deleted file mode 100644
index 26f6cc51..00000000
--- a/doc/concepts/garbage.org
+++ /dev/null
@@ -1,82 +0,0 @@
-* Garbage Collection
-
-For every build, for all non-failed actions an entry is created in
-the action cache and the corresponding artifacts are stored in the
-CAS. So, over time, a lot of files accumulate in the local build
-root. Hence we have a way to reclaim disk space while keeping the
-benefits of having a cache. This operation is referred to as garbage
-collection and usually uses the heuristics to keeping what is most
-recently used. Our approach follows this paradigm as well.
-
-** Invariants assumed by our build system
-
-Our tool assumes several invariants on the local build root, that we
-need to maintain during garbage collection. Those are the following.
-- If an artifact is referenced in any cache entry (action cache,
- target-level cache), then the corresponding artifact is in CAS.
-- If a tree is in CAS, then so are its immediate parts (and hence
- also all transitive parts).
-
-
-** Generations of cache and CAS
-
-In order to allow garbage collection while keeping the desired
-invariants, we keep several (currently two) generations of cache
-and CAS. Each generation in itself has to fulfill the invariants.
-The effective cache or CAS is the union of the caches or CASes of
-all generations, respectively. Obviously, then the effective cache
-and CAS fulfill the invariants as well.
-
-The actual ~gc~ command rotates the generations: the oldest
-generation is be removed and the remaining generations are moved
-one number up (i.e., currently the young generation will simply
-become the old generation), implicitly creating a new, empty,
-youngest generation. As an empty generation fulfills the required
-invariants, this operation preservers the requirement that each
-generation individually fulfill the invariants.
-
-All additions are made to the youngest generation; in order to keep
-the invariant, relevant entries only present in an older generation
-are also added to the youngest generation first. Moreover, whenever
-an entry is referenced in any way (cache hit, request for an entry
-to be in CAS) and is only present in an older generation, it is
-also added to the younger generation, again adding referenced
-parts first. As a consequence, the youngest generation contains
-everything directly or indirectly referenced since the last garbage
-collection; in particular, everything referenced since the last
-garbage collection will remain in the effective cache or CAS upon
-the next garbage collection.
-
-These generations are stored as separate directories inside the
-local build root. As the local build root is, starting from an
-empty directory, entirely managed by `just` and compatible tools,
-generations are on the same file system. Therefore the adding of
-old entries to the youngest generation can be implemented in an
-efficient way by using hard links.
-
-The moving up of generations can happen atomically by renaming the
-respective directory. Also, the oldest generation can be removed
-logically by renaming a directory to a name that is not searched
-for when looking for existing generations. The actual recursive
-removal from the file system can then happen in a separate step
-without any requirements on order.
-
-** Parallel operations in the presence of garbage collection
-
-The addition to cache and CAS can continue to happen in parallel;
-that certain values are taken from an older generation instead
-of freshly computed does not make a difference for the youngest
-generation (which is the only generation modified). But build
-processes assume they don't violate the invariant if they first
-add files to CAS and later a tree or cache entry referencing them.
-This, however, only holds true if no generation rotation happens in
-between. To avoid those kind of races, we make processes coordinate
-over a single lock for each build root.
-- Any build process keeps a shared lock for the entirety of the build.
-- The garbage collection process takes an exclusive lock for the
- period it does the directory renames.
-We consider it acceptable that, in theory, local build processes
-could starve local garbage collection. Moreover, it should be noted
-that the actual removal of no-longer-needed files from the file
-system happens without any lock being held. Hence the disturbance
-of builds caused by garbage collection is small.
diff --git a/doc/concepts/multi-repo.md b/doc/concepts/multi-repo.md
new file mode 100644
index 00000000..c465360e
--- /dev/null
+++ b/doc/concepts/multi-repo.md
@@ -0,0 +1,170 @@
+Multi-repository build
+======================
+
+Repository configuration
+------------------------
+
+### Open repository names
+
+A repository can have external dependencies. This is realized by having
+unbound ("open") repository names being used as references. The actual
+definition of those external repositories is not part of the repository;
+we think of them as inputs, i.e., we think of this repository as a
+function of the referenced external targets.
+
+### Binding in a separate repository configuration
+
+The actual binding of the free repository names is specified in a
+separate repository-configuration file, which is specified on the
+command line (via the `-C` option); this command-line argument is
+optional and the default is that the repository worked on has no
+external dependencies. Typically (but not necessarily), this
+repository-configuration file is located outside the referenced
+repositories and versioned separately or generated from such a file via
+`bin/just-mr.py`. It serves as meta-data for a group of repositories
+belonging together.
+
+This file contains one JSON object. For the key `"repositories"` the
+value is an object; its keys are the global names of the specified
+repositories. For each repository, there is an object describing it. The
+key `"workspace_root"` describes where to find the repository and should
+be present for all (direct or indirect) external dependencies of the
+repository worked upon. Additional roots file names (for target, rule,
+and expression) can be specified. For keys not given, the same rules for
+default values apply as for the corresponding command-line arguments.
+Additionally, for each repository, the key "bindings" specifies the
+map of the open repository names to the global names that provide these
+dependencies. Repositories may depend on each other (or even
+themselves), but the resulting global target graph has to be cycle free.
+
+Whenever a location has to be specified, the value has to be a list,
+with the first entry being specifying the naming scheme; the semantics
+of the remaining entries depends on the scheme (see "Root Naming
+Schemes" below).
+
+Additionally, the key `"main"` (with default `""`) specifies the main
+repository. The target to be built (as specified on the command line) is
+taken from this repository. Also, the command-line arguments `-w`,
+`--target_root`, etc, apply to this repository. If no option `-w` is
+given and `"workspace_root"` is not specified in the
+repository-configuration file either, the root is determined from the
+working directory as usual.
+
+The value of `main` can be overwritten on the command line (with the
+`--main` option) In this way, a consistent configuration of
+interdependent repositories can be versioned and referred to regardless
+of the repository worked on.
+
+#### Root naming scheme
+
+##### `"file"`
+
+The `"file"` scheme tells that the repository (or respective
+root) can be found in a directory in the local file system; the
+only argument is the absolute path to that directory.
+
+##### `"git tree"`
+
+The `"git tree"` scheme tells that the root is defined to be a
+tree given by a git tree identifier. It takes two arguments
+
+ - the tree identifier, as hex-encoded string, and
+ - the absolute path to some repository containing that tree
+
+#### Example
+
+Consider, for example, the following repository-configuration file.
+In the following, we assume it is located at `/etc/just/repos.json`.
+
+``` jsonc
+{ "main": "env"
+, "repositories":
+ { "foobar":
+ { "workspace_root": ["file", "/opt/foobar/repo"]
+ , "rule_root": ["file", "/etc/just/rules"]
+ , "bindings": {"base": "barimpl"}
+ }
+ , "barimpl":
+ { "workspace_root": ["file", "/opt/barimpl"]
+ , "target_file_name": "TARGETS.bar"
+ }
+ , "env": {"bindings": {"foo": "foobar", "bar": "barimpl"}}
+ }
+}
+```
+
+It specifies 3 repositories, with global names `foobar`, `barimpl`,
+and `env`. Within `foobar`, the repository name `base` refers to
+`barimpl`, the repository that can be found at `/opt/barimpl`.
+
+The repository `env` is the main repository and there is no
+workspace root defined for it, so it only provides bindings for
+external repositories `foo` and `bar`, but the actual repository is
+taken from the working directory (unless `-w` is specified). In this
+way, it provides an environment for developing applications based on
+`foo` and `bar`.
+
+For example, the invocation `just build -C /etc/just/repos.conf
+baz` tells our tool to build the target `baz` from the module the
+working directory is located in. `foo` will refer to the repository
+found at `/opt/foobar/repo` (using rules from `/etc/just/rules`,
+taking `base` refer to the repository at `/opt/barimpl`) and `bar`
+will refer to the repository at `/opts/barimpl`.
+
+Naming of targets
+-----------------
+
+### Reference in target files
+
+In addition to the normal target references (string for a target in the
+name module, module-target pair for a target in same repository,
+`["./", relpath, target]` relative addressing, `["FILE", null,
+name]` explicit file reference in the same module), references of the
+form `["@", repo, module, target]` can be specified, where `repo` is
+string referring to an open name. That open repository name is resolved
+to the global name by the `"bindings"` parameter of the repository the
+target reference is made in. Within the repository the resolved name
+refers to, the target `[module, target]` is taken.
+
+### Expression language: names as abstract values
+
+Targets are a global concept as they distinguish targets from different
+repositories. Their names, however, depend on the repository they occur
+in (as the local names might differ in various repositories). Moreover,
+some targets cannot be named in certain repositories as not every
+repository has a local name in every other repository.
+
+To handle this naming problem, we note the following. During the
+evaluation of a target names occur at two places: as the result of
+evaluating the parameters (for target fields) and in the evaluation of
+the defining expression when requesting properties of a target dependent
+upon (via `DEP_ARTIFACTS` and related functions). In the later case,
+however, the only legitimate way to obtain a target name is by the
+`FIELD` function. To enforce this behavior, and to avoid problems with
+serializing target names, our expression language considers target names
+as opaque values. More precisely,
+
+ - in a target description, the target fields are evaluated and the
+ result of the evaluation is parsed, in the context of the module the
+ `TARGET` file belongs to, as a target name, and
+ - during evaluation of the defining expression of a the target's
+ rule, when accessing `FIELD` the values of target fields will be
+ reported as abstract name values and when querying values of
+ dependencies (via `DEP_ARTIFACTS` etc) the correct abstract target
+ name has to be provided.
+
+While the defining expression has access to target names (via target
+fields), it is not useful to provide them in provided data; a consuming
+data cannot use names unless it has those fields as dependency anyway.
+Our tool will not enforce this policy; however, only targets not having
+names in their provided data are eligible to be used in `export` rules.
+
+File layout in actions
+----------------------
+
+As `just` does full staging for actions, no special considerations are
+needed when combining targets of different repositories. Each target
+brings its staging of artifacts as usual. In particular, no repository
+names (neither local nor global ones) will ever be visible in any
+action. So for the consuming target it makes no difference if its
+dependency comes from the same or a different repository.
diff --git a/doc/concepts/multi-repo.org b/doc/concepts/multi-repo.org
deleted file mode 100644
index f1ad736f..00000000
--- a/doc/concepts/multi-repo.org
+++ /dev/null
@@ -1,167 +0,0 @@
-* Multi-repository build
-
-** Repository configuration
-
-*** Open repository names
-
-A repository can have external dependencies. This is realized by
-having unbound ("open") repository names being used as references.
-The actual definition of those external repositories is not part
-of the repository; we think of them as inputs, i.e., we think of
-this repository as a function of the referenced external targets.
-
-*** Binding in a separate repository configuration
-
-The actual binding of the free repository names is specified in a
-separate repository-configuration file, which is specified on the
-command line (via the ~-C~ option); this command-line argument
-is optional and the default is that the repository worked on has
-no external dependencies. Typically (but not necessarily), this
-repository-configuration file is located outside the referenced
-repositories and versioned separately or generated from such a
-file via ~bin/just-mr.py~. It serves as meta-data for a group of
-repositories belonging together.
-
-This file contains one JSON object. For the key ~"repositories"~ the
-value is an object; its keys are the global names of the specified
-repositories. For each repository, there is an object describing it.
-The key ~"workspace_root"~ describes where to find the repository and
-should be present for all (direct or indirect) external dependencies
-of the repository worked upon. Additional roots file names (for
-target, rule, and expression) can be specified. For keys not given,
-the same rules for default values apply as for the corresponding
-command-line arguments. Additionally, for each repository, the
-key "bindings" specifies the map of the open repository names to
-the global names that provide these dependencies. Repositories may
-depend on each other (or even themselves), but the resulting global
-target graph has to be cycle free.
-
-Whenever a location has to be specified, the value has to be a
-list, with the first entry being specifying the naming scheme; the
-semantics of the remaining entries depends on the scheme (see "Root
-Naming Schemes" below).
-
-Additionally, the key ~"main"~ (with default ~""~) specifies
-the main repository. The target to be built (as specified on the
-command line) is taken from this repository. Also, the command-line
-arguments ~-w~, ~--target_root~, etc, apply to this repository. If
-no option ~-w~ is given and ~"workspace_root"~ is not specified in
-the repository-configuration file either, the root is determined
-from the working directory as usual.
-
-The value of ~main~ can be overwritten on the command line (with
-the ~--main~ option) In this way, a consistent configuration
-of interdependent repositories can be versioned and referred to
-regardless of the repository worked on.
-
-**** Root naming scheme
-
-***** ~"file"~
-
-The ~"file"~ scheme tells that the repository (or respective root)
-can be found in a directory in the local file system; the only
-argument is the absolute path to that directory.
-
-
-***** ~"git tree"~
-
-The ~"git tree"~ scheme tells that the root is defined to be a tree
-given by a git tree identifier. It takes two arguments
-- the tree identifier, as hex-encoded string, and
-- the absolute path to some repository containing that tree
-
-**** Example
-
-Consider, for example, the following repository-configuration file.
-In the following, we assume it is located at ~/etc/just/repos.json~.
-
-#+BEGIN_SRC
-{ "main": "env"
-, "repositories":
- { "foobar":
- { "workspace_root": ["file", "/opt/foobar/repo"]
- , "rule_root": ["file", "/etc/just/rules"]
- , "bindings": {"base": "barimpl"}
- }
- , "barimpl":
- { "workspace_root": ["file", "/opt/barimpl"]
- , "target_file_name": "TARGETS.bar"
- }
- , "env": {"bindings": {"foo": "foobar", "bar": "barimpl"}}
- }
-}
-#+END_SRC
-
-It specifies 3 repositories, with global names ~foobar~, ~barimpl~,
-and ~env~. Within ~foobar~, the repository name ~base~ refers to
-~barimpl~, the repository that can be found at ~/opt/barimpl~.
-
-The repository ~env~ is the main repository and there is no workspace
-root defined for it, so it only provides bindings for external
-repositories ~foo~ and ~bar~, but the actual repository is taken
-from the working directory (unless ~-w~ is specified). In this way,
-it provides an environment for developing applications based on
-~foo~ and ~bar~.
-
-For example, the invocation ~just build -C /etc/just/repos.conf
-baz~ tells our tool to build the target ~baz~ from the module the
-working directory is located in. ~foo~ will refer to the repository
-found at ~/opt/foobar/repo~ (using rules from ~/etc/just/rules~,
-taking ~base~ refer to the repository at ~/opt/barimpl~) and
-~bar~ will refer to the repository at ~/opts/barimpl~.
-
-** Naming of targets
-
-*** Reference in target files
-
-In addition to the normal target references (string for a target in
-the name module, module-target pair for a target in same repository,
-~["./", relpath, target]~ relative addressing, ~["FILE", null,
-name]~ explicit file reference in the same module), references of the
-form ~["@", repo, module, target]~ can be specified, where ~repo~
-is string referring to an open name. That open repository name is
-resolved to the global name by the ~"bindings"~ parameter of the
-repository the target reference is made in. Within the repository
-the resolved name refers to, the target ~[module, target]~ is taken.
-
-*** Expression language: names as abstract values
-
-Targets are a global concept as they distinguish targets from different
-repositories. Their names, however, depend on the repository they
-occur in (as the local names might differ in various repositories).
-Moreover, some targets cannot be named in certain repositories as
-not every repository has a local name in every other repository.
-
-To handle this naming problem, we note the following. During the
-evaluation of a target names occur at two places: as the result of
-evaluating the parameters (for target fields) and in the evaluation
-of the defining expression when requesting properties of a target
-dependent upon (via ~DEP_ARTIFACTS~ and related functions). In the
-later case, however, the only legitimate way to obtain a target
-name is by the ~FIELD~ function. To enforce this behavior, and
-to avoid problems with serializing target names, our expression
-language considers target names as opaque values. More precisely,
-- in a target description, the target fields are evaluated and the
- result of the evaluation is parsed, in the context of the module
- the ~TARGET~ file belongs to, as a target name, and
-- during evaluation of the defining expression of a the target's
- rule, when accessing ~FIELD~ the values of target fields will
- be reported as abstract name values and when querying values of
- dependencies (via ~DEP_ARTIFACTS~ etc) the correct abstract target
- name has to be provided.
-
-While the defining expression has access to target names (via
-target fields), it is not useful to provide them in provided data;
-a consuming data cannot use names unless it has those fields as
-dependency anyway. Our tool will not enforce this policy; however,
-only targets not having names in their provided data are eligible
-to be used in ~export~ rules.
-
-** File layout in actions
-
-As ~just~ does full staging for actions, no special considerations
-are needed when combining targets of different repositories. Each
-target brings its staging of artifacts as usual. In particular, no
-repository names (neither local nor global ones) will ever be visible
-in any action. So for the consuming target it makes no difference
-if its dependency comes from the same or a different repository.
diff --git a/doc/concepts/overview.md b/doc/concepts/overview.md
new file mode 100644
index 00000000..a9bcc847
--- /dev/null
+++ b/doc/concepts/overview.md
@@ -0,0 +1,210 @@
+Tool Overview
+=============
+
+Structuring
+-----------
+
+### Structuring the Build: Targets, Rules, and Actions
+
+The primary units this build system deals with are targets: the user
+requests the system to build (or install) a target, targets depend on
+other targets, etc. Targets typically reflect the units a software
+developer thinks in: libraries, binaries, etc. The definition of a
+target only describes the information directly belonging to the target,
+e.g., its source, private and public header files, and its direct
+dependencies. Any other information needed to build a target (like the
+public header files of an indirect dependency) are inferred by the build
+tool. In this way, the build description can be kept maintainable
+
+A built target consists of files logically belonging together (like the
+actual library file and its public headers) as well as information on
+how to use the target (linking arguments, transitive header files, etc).
+For a consumer of a target, the definition of this collection of files
+as well as the additionally provided information is what defines the
+target as a dependency, respectively of where the target is coming from
+(i.e., targets coinciding here are indistinguishable for other targets).
+
+Of course, to actually build a single target from its dependencies, many
+invocations of the compiler or other tools are necessary (so called
+"actions"); the build tool translates these high level description
+into the individual actions necessary and only re-executes those where
+inputs have changed.
+
+This translation of high-level concepts into individual actions is not
+hard coded into the tool. It is provided by the user as "rules" and
+forms additional input to the build. To avoid duplicate work, rules are
+typically maintained centrally for a project or an organization.
+
+### Structuring the Code: Modules and Repositories
+
+The code base is usually split into many directories, each containing
+source files belonging together. To allow the definition of targets
+where their code is, the targets are structured in a similar way. For
+each directory, there can be a targets files. Directories for which such
+a targets file exists are called "modules". Each file belongs to the
+module that is closest when searching upwards in the directory tree. The
+targets file of a module defines the targets formed from the source
+files belonging to this module.
+
+Larger projects are often split into "repositories". For this build
+tool, a repository is a logical unit. Often those coincide with the
+repositories in the sense of version control. This, however, does not
+have to be the case. Also, from one directory in the file system many
+repositories can be formed that might differ in the rules used, targets
+defined, or binding of their dependencies.
+
+Staging
+-------
+
+A peculiarity of this build system is the complete separation between
+physical and logical paths. Targets have their own view of the world,
+i.e., they can place their artifacts at any logical path they like, and
+this is how they look to other targets. It is up to the consuming
+targets what they do with artifacts of the targets they depend on; in
+particular, they are not obliged to leave them at the logical location
+their dependency put them.
+
+When such a collection of artifacts at logical locations (often referred
+to as the "stage") is realized on the file system (when installing a
+target, or as inputs to actions), the paths are interpreted as paths
+relative to the respective root (installation or action directory).
+
+This separation is what allows flexible combination of targets from
+various sources without leaking repository names or different file
+arrangement if a target is in the "main" repository.
+
+Repository data
+---------------
+
+A repository uses a (logical) directory for several purposes: to obtain
+source files, to read definitions of targets, to read rules, and to read
+expressions that can be used by rules. While all those directories can
+(and often are) be the same, this does not have to be the case. For each
+of those purposes, a different logical directory (also called "root")
+can be used. In this way, one can, e.g., add target definitions to a
+source tree originally written for a different build tool without
+modifying the original source tree.
+
+Those roots are usually defined in a repository configuration. For the
+"main" repository, i.e., the repository from which the target to be
+built is requested, the roots can also be overwritten at the command
+line. Roots can be defined as paths in the file system, but also as
+`git` tree identifiers (together with the location of some repository
+containing that tree). The latter definition is preferable for rules and
+dependencies, as it allows high-level caching of targets. It also
+motivates the need of adding target definitions without changing the
+root itself.
+
+The same flexibility as for the roots is also present for the names of
+the files defining targets, rules, and expressions. While the default
+names `TARGETS`, `RULES`, and `EXPRESSIONS` are often used, other file
+names can be specified for those as well, either in the repository
+configuration or (for the main repository) on the command line.
+
+The final piece of data needed to describe a repository is the binding
+of the open repository names that are used to refer to other
+repositories. More details can be found in the documentation on
+multi-repository builds.
+
+Targets
+-------
+
+### Target naming
+
+In description files, targets, rules, and expressions are referred to by
+name. As the context always fixes if a name for a target, rule, or
+expression is expected, they use the same naming scheme.
+
+ - A single string refers to the target with this name in the same
+ module.
+ - A pair `[module, name]` refers to the target `name` in the module
+ `module` of the same repository. There are no module names with a
+ distinguished meaning. The naming scheme is unambiguous, as all
+ other names given by lists have length at least 3.
+ - A list `["./", relative-module-path, name]` refers to a target with
+ the given name in the module that has the specified path relative to
+ the current module (in the current repository).
+ - A list `["@", repository, module, name]` refers to the target with
+ the specified name in the specified module of the specified
+ repository.
+
+Additionally, there are special targets that can also be referred to in
+target files.
+
+ - An explicit reference of a source-file target in the same module,
+ specified as `["FILE", null, name]`. The explicit `null` at the
+ second position (where normally the module would be) is necessary to
+ ensure the name has length more than 2 to distinguish it from a
+ reference to the module `"FILE"`.
+ - A reference to an collection, given by a shell pattern, of explicit
+ source files in the top-level directory of the same module,
+ specified as `["GLOB", null, pattern]`. The explicit `null` at
+ second position is required for the same reason as in the explicit
+ file reference.
+ - A reference to a tree target in the same module, specified as
+ `["TREE", null, name]`. The explicit `null` at second position is
+ required for the same reason as in the explicit file reference.
+
+### Data of an analyzed target
+
+Analyzing a target results in 3 pieces of data.
+
+ - The "artifacts" are a staged collection of artifacts. Typically,
+ these are what is normally considered the main reason to build a
+ target, e.g., the actual library file in case of a library.
+
+ - The "runfiles" are another staged collection of artifacts.
+ Typically, these are files that directly belong to the target and
+ are somehow needed to use the target. For example, in case of a
+ library that would be the public header files of the library itself.
+
+ - A "provides" map with additional information the target wants to
+ provide to its consumers. The data contained in that map can also
+ contain additional artifacts. Typically, this the remaining
+ information needed to use the target in a build.
+
+ In case of a library, that typically would include any other
+ libraries this library transitively depends upon (a stage), the
+ correct linking order (a list of strings), and the public headers of
+ the transitive dependencies (another stage).
+
+A target is completely determined by these 3 pieces of data. A consumer
+of the target will have no other information available. Hence it is
+crucial, that everything (apart from artifacts and runfiles) needed to
+build against that target is contained in the provides map.
+
+When the installation of a target is requested on the command line,
+artifacts and runfiles are installed; in case of staging conflicts,
+artifacts take precedence.
+
+### Source targets
+
+#### Files
+
+If a target is not found in the targets file, it is implicitly
+treated as a source file. Both, explicit and implicit source files
+look the same. The artifacts stage has a single entry: the path is
+the relative path of the file to the module root and the value the
+file artifact located at the specified location. The runfiles are
+the same as the artifacts and the provides map is empty.
+
+#### Collection of files given by a shell pattern
+
+A collection of files given by a shell pattern has, both as
+artifacts and runfiles, the (necessarily disjoint) union of the
+artifact maps of the (zero or more) source targets that match the
+pattern. Only *files* in the *top-level* directory of the given
+modules are considered for matches. The provides map is empty.
+
+#### Trees
+
+A tree describes a directory. Internally, however, it is a single
+opaque artifact. Consuming targets cannot look into the internal
+structure of that tree. Only when realized in the file system (when
+installation is requested or as part of the input to an action), the
+directory structure is visible again.
+
+An explicit tree target is similar to an explicit file target,
+except that at the specified location there has to be a directory
+rather than a file and the tree artifact corresponding to that
+directory is taken instead of a file artifact.
diff --git a/doc/concepts/overview.org b/doc/concepts/overview.org
deleted file mode 100644
index 5dc7ad20..00000000
--- a/doc/concepts/overview.org
+++ /dev/null
@@ -1,206 +0,0 @@
-* Tool Overview
-
-** Structuring
-
-*** Structuring the Build: Targets, Rules, and Actions
-
-The primary units this build system deals with are targets: the
-user requests the system to build (or install) a target, targets
-depend on other targets, etc. Targets typically reflect the units a
-software developer thinks in: libraries, binaries, etc. The definition
-of a target only describes the information directly belonging to
-the target, e.g., its source, private and public header files, and
-its direct dependencies. Any other information needed to build a
-target (like the public header files of an indirect dependency)
-are inferred by the build tool. In this way, the build description
-can be kept maintainable
-
-A built target consists of files logically belonging together (like
-the actual library file and its public headers) as well as information
-on how to use the target (linking arguments, transitive header files,
-etc). For a consumer of a target, the definition of this collection
-of files as well as the additionally provided information is what
-defines the target as a dependency, respectively of where the target
-is coming from (i.e., targets coinciding here are indistinguishable
-for other targets).
-
-Of course, to actually build a single target from its dependencies,
-many invocations of the compiler or other tools are necessary (so
-called "actions"); the build tool translates these high level
-description into the individual actions necessary and only re-executes
-those where inputs have changed.
-
-This translation of high-level concepts into individual actions
-is not hard coded into the tool. It is provided by the user as
-"rules" and forms additional input to the build. To avoid duplicate
-work, rules are typically maintained centrally for a project or an
-organization.
-
-*** Structuring the Code: Modules and Repositories
-
-The code base is usually split into many directories, each containing
-source files belonging together. To allow the definition of targets
-where their code is, the targets are structured in a similar way.
-For each directory, there can be a targets files. Directories for
-which such a targets file exists are called "modules". Each file
-belongs to the module that is closest when searching upwards in the
-directory tree. The targets file of a module defines the targets
-formed from the source files belonging to this module.
-
-Larger projects are often split into "repositories". For this build
-tool, a repository is a logical unit. Often those coincide with
-the repositories in the sense of version control. This, however,
-does not have to be the case. Also, from one directory in the file
-system many repositories can be formed that might differ in the
-rules used, targets defined, or binding of their dependencies.
-
-** Staging
-
-A peculiarity of this build system is the complete separation
-between physical and logical paths. Targets have their own view of
-the world, i.e., they can place their artifacts at any logical path
-they like, and this is how they look to other targets. It is up to
-the consuming targets what they do with artifacts of the targets
-they depend on; in particular, they are not obliged to leave them
-at the logical location their dependency put them.
-
-When such a collection of artifacts at logical locations (often
-referred to as the "stage") is realized on the file system (when
-installing a target, or as inputs to actions), the paths are
-interpreted as paths relative to the respective root (installation
-or action directory).
-
-This separation is what allows flexible combination of targets from
-various sources without leaking repository names or different file
-arrangement if a target is in the "main" repository.
-
-** Repository data
-
-A repository uses a (logical) directory for several purposes: to
-obtain source files, to read definitions of targets, to read rules,
-and to read expressions that can be used by rules. While all those
-directories can (and often are) be the same, this does not have
-to be the case. For each of those purposes, a different logical
-directory (also called "root") can be used. In this way, one can,
-e.g., add target definitions to a source tree originally written for
-a different build tool without modifying the original source tree.
-
-Those roots are usually defined in a repository configuration. For
-the "main" repository, i.e., the repository from which the target
-to be built is requested, the roots can also be overwritten at the
-command line. Roots can be defined as paths in the file system,
-but also as ~git~ tree identifiers (together with the location
-of some repository containing that tree). The latter definition
-is preferable for rules and dependencies, as it allows high-level
-caching of targets. It also motivates the need of adding target
-definitions without changing the root itself.
-
-The same flexibility as for the roots is also present for the names
-of the files defining targets, rules, and expressions. While the
-default names ~TARGETS~, ~RULES~, and ~EXPRESSIONS~ are often used,
-other file names can be specified for those as well, either in
-the repository configuration or (for the main repository) on the
-command line.
-
-The final piece of data needed to describe a repository is the
-binding of the open repository names that are used to refer to
-other repositories. More details can be found in the documentation
-on multi-repository builds.
-
-** Targets
-
-*** Target naming
-
-In description files, targets, rules, and expressions are referred
-to by name. As the context always fixes if a name for a target,
-rule, or expression is expected, they use the same naming scheme.
-- A single string refers to the target with this name in the
- same module.
-- A pair ~[module, name]~ refers to the target ~name~ in the module
- ~module~ of the same repository. There are no module names with
- a distinguished meaning. The naming scheme is unambiguous, as
- all other names given by lists have length at least 3.
-- A list ~["./", relative-module-path, name]~ refers to a target
- with the given name in the module that has the specified path
- relative to the current module (in the current repository).
-- A list ~["@", repository, module, name]~ refers to the target
- with the specified name in the specified module of the specified
- repository.
-
-Additionally, there are special targets that can also be referred
-to in target files.
-- An explicit reference of a source-file target in the same module,
- specified as ~["FILE", null, name]~. The explicit ~null~ at the
- second position (where normally the module would be) is necessary
- to ensure the name has length more than 2 to distinguish it from
- a reference to the module ~"FILE"~.
-- A reference to an collection, given by a shell pattern, of explicit
- source files in the top-level directory of the same module,
- specified as ~["GLOB", null, pattern]~. The explicit ~null~ at
- second position is required for the same reason as in the explicit
- file reference.
-- A reference to a tree target in the same module, specified as
- ~["TREE", null, name]~. The explicit ~null~ at second position is
- required for the same reason as in the explicit file reference.
-
-*** Data of an analyzed target
-
-Analyzing a target results in 3 pieces of data.
-- The "artifacts" are a staged collection of artifacts. Typically,
- these are what is normally considered the main reason to build
- a target, e.g., the actual library file in case of a library.
-- The "runfiles" are another staged collection of artifacts. Typically,
- these are files that directly belong to the target and are somehow
- needed to use the target. For example, in case of a library that
- would be the public header files of the library itself.
-- A "provides" map with additional information the target wants
- to provide to its consumers. The data contained in that map can
- also contain additional artifacts. Typically, this the remaining
- information needed to use the target in a build.
-
- In case of a library, that typically would include any other
- libraries this library transitively depends upon (a stage),
- the correct linking order (a list of strings), and the public
- headers of the transitive dependencies (another stage).
-
-A target is completely determined by these 3 pieces of data. A
-consumer of the target will have no other information available.
-Hence it is crucial, that everything (apart from artifacts and
-runfiles) needed to build against that target is contained in the
-provides map.
-
-When the installation of a target is requested on the command line,
-artifacts and runfiles are installed; in case of staging conflicts,
-artifacts take precedence.
-
-*** Source targets
-
-**** Files
-
-If a target is not found in the targets file, it is implicitly
-treated as a source file. Both, explicit and implicit source files
-look the same. The artifacts stage has a single entry: the path is
-the relative path of the file to the module root and the value the
-file artifact located at the specified location. The runfiles are
-the same as the artifacts and the provides map is empty.
-
-**** Collection of files given by a shell pattern
-
-A collection of files given by a shell pattern has, both as artifacts
-and runfiles, the (necessarily disjoint) union of the artifact
-maps of the (zero or more) source targets that match the pattern.
-Only /files/ in the /top-level/ directory of the given modules are
-considered for matches. The provides map is empty.
-
-**** Trees
-
-A tree describes a directory. Internally, however, it is a single
-opaque artifact. Consuming targets cannot look into the internal
-structure of that tree. Only when realized in the file system (when
-installation is requested or as part of the input to an action),
-the directory structure is visible again.
-
-An explicit tree target is similar to an explicit file target, except
-that at the specified location there has to be a directory rather
-than a file and the tree artifact corresponding to that directory
-is taken instead of a file artifact.
diff --git a/doc/concepts/rules.md b/doc/concepts/rules.md
new file mode 100644
index 00000000..2ab4c334
--- /dev/null
+++ b/doc/concepts/rules.md
@@ -0,0 +1,567 @@
+User-defined Rules
+==================
+
+Targets are defined in terms of high-level concepts like "libraries",
+"binaries", etc. In order to translate these high-level definitions
+into actionable tasks, the user defines rules, explaining at a single
+point how all targets of a given type are built.
+
+Rules files
+-----------
+
+Rules are defined in rules files (by default named `RULES`). Those
+contain a JSON object mapping rule names to their rule definition. For
+rules, the same naming scheme as for targets applies. However, built-in
+rules (always named by a single string) take precedence in naming; to
+explicitly refer to a rule defined in the current module, the module has
+to be specified, possibly by a relative path, e.g.,
+`["./", ".", "install"]`.
+
+Basic components of a rule
+--------------------------
+
+A rule is defined through a JSON object with various keys. The only
+mandatory key is `"expression"` containing the defining expression of
+the rule.
+
+### `"config_fields"`, `"string_fields"` and `"target_fields"`
+
+These keys specify the fields that a target defined by that rule can
+have. In particular, those have to be disjoint lists of strings.
+
+For `"config_fields"` and `"string_fields"` the respective field has to
+evaluate to a list of strings, whereas `"target_fields"` have to
+evaluate to a list of target references. Those references are evaluated
+immediately, and in the name context of the target they occur in.
+
+The difference between `"config_fields"` and `"string_fields"` is that
+`"config_fields"` are evaluated before the target fields and hence can
+be used by the rule to specify config transitions for the target fields.
+`"string_fields"` on the other hand are evaluated *after*
+the target fields; hence the rule cannot use them to specify a
+configuration transition, however the target definition in those fields
+may use the `"outs"` and `"runfiles"` functions to have access to the
+names of the artifacts or runfiles of a target specified in one of the
+target fields.
+
+### `"implicit"`
+
+This key specifies a map of implicit dependencies. The keys of the map
+are additional target fields, the values are the fixed list of targets
+for those fields. If a short-form name of a target is used (e.g., only a
+string instead of a module-target pair), it is interpreted relative to
+the repository and module the rule is defined in, not the one the rule
+is used in. Other than this, those fields are evaluated the same way as
+target fields settable on invocation of the rule.
+
+### `"config_vars"`
+
+This is a list of strings specifying which parts of the configuration
+the rule uses. The defining expression of the rule is evaluated in an
+environment that is the configuration restricted to those variables; if
+one of those variables is not specified in the configuration the value
+in the restriction is `null`.
+
+### `"config_transitions"`
+
+This key specifies a map of (some of) the target fields (whether
+declared as `"target_fields"` or as `"implicit"`) to a configuration
+expression. Here, a configuration expression is any expression in our
+language. It has access to the `"config_vars"` and the `"config_fields"`
+and has to evaluate to a list of maps. Each map specifies a transition
+to the current configuration by amending it on the domain of that map to
+the given value.
+
+### `"imports"`
+
+This specifies a map of expressions that can later be used by
+`CALL_EXPRESSION`. In this way, duplication of (rule) code can be
+avoided. For each key, we have to have a name of an expression;
+expressions are named following the same naming scheme as targets and
+rules. The names are resolved in the context of the rule. Expressions
+themselves are defined in expression files, the default name being
+`EXPRESSIONS`.
+
+Each expression is a JSON object. The only mandatory key is
+`"expression"` which has to be an expression in our language. It
+optionally can have a key `"vars"` where the value has to be a list of
+strings (and the default is the empty list). Additionally, it can have
+another optional key `"imports"` following the same scheme as the
+`"imports"` key of a rule; in the `"imports"` key of an expression,
+names are resolved in the context of that expression. It is a
+requirement that the `"imports"` graph be cycle free.
+
+### `"expression"`
+
+This specifies the defining expression of the rule. The value has to be
+an expression of our expression language (basically, an abstract syntax
+tree serialized as JSON). It has access to the following extra functions
+and, when evaluated, has to return a result value.
+
+#### `FIELD`
+
+The field function takes one argument, `name` which has to evaluate
+to the name of a field. For string fields, the given list of strings
+is returned; for target fields, the list of abstract names for the
+given target is returned. These abstract names are opaque within the
+rule language (but meaningful when reported in error messages) and
+should only be used to be passed on to other functions that expect
+names as inputs.
+
+#### `DEP_ARTIFACTS` and `DEP_RUNFILES`
+
+These functions give access to the artifacts, or runfiles,
+respectively, of one of the targets depended upon. It takes two
+(evaluated) arguments, the mandatory `"dep"` and the optional
+`"transition"`.
+
+The argument `"dep"` has to evaluate to an abstract name (as can be
+obtained from the `FIELD` function) of some target specified in one
+of the target fields. The `"transition"` argument has to evaluate to
+a configuration transition (i.e., a map) and the empty transition is
+taken as default. It is an error to request a target-transition pair
+for a target that was not requested in the given transition through
+one of the target fields.
+
+#### `DEP_PROVIDES`
+
+This function gives access to a particular entry of the provides map
+of one of the targets depended upon. The arguments `"dep"` and
+`"transition"` are as for `DEP_ARTIFACTS`; additionally, there is
+the mandatory argument `"provider"` which has to evaluate to a
+string. The function returns the value of the provides map of the
+target at the given provider. If the key is not in the provides map
+(or the value at that key is `null`), the optional argument
+`"default"` is evaluated and returned. The default for `"default"`
+is the empty list.
+
+#### `BLOB`
+
+The `BLOB` function takes a single (evaluated) argument `data` which
+is optional and defaults to the empty string. This argument has to
+evaluate to a string. The function returns an artifact that is a
+non-executable file with the given string as content.
+
+#### `TREE`
+
+The `TREE` function takes a single (evaluated) argument `$1` which
+has to be a map of artifacts. The result is a single tree artifact
+formed from the input map. It is an error if the map cannot be
+transformed into a tree (e.g., due to staging conflicts).
+
+#### `ACTION`
+
+Actions are a way to define new artifacts from (zero or more)
+already defined artifacts by running a command, typically a
+compiler, linker, archiver, etc. The action function takes the
+following arguments.
+
+ - `"inputs"` A map of artifacts. These artifacts are present when
+ the command is executed; the keys of the map are the relative
+ path from the working directory of the command. The command must
+ not make any assumption about the location of the working
+ directory in the file system (and instead should refer to files
+ by path relative to the working directory). Moreover, the
+ command must not modify the input files in any way. (In-place
+ operations can be simulated by staging, as is shown in the
+ example later in this document.)
+
+ It is an additional requirement that no conflicts occur when
+ interpreting the keys as paths. For example, `"foo.txt"` and
+ `"./foo.txt"` are different as strings and hence legitimately
+ can be assigned different values in a map. When interpreted as a
+ path, however, they name the same path; so, if the `"inputs"`
+ map contains both those keys, the corresponding values have to
+ be equal.
+
+ - `"cmd"` The command to execute, given as `argv` vector, i.e., a
+ non-empty list of strings. The 0'th element of that list will
+ also be the program to be executed.
+
+ - `"env"` The environment in which the command should be executed,
+ given as a map of strings to strings.
+
+ - `"outs"` and `"out_dirs"` Two list of strings naming the files
+ and directories, respectively, the command is expected to
+ create. It is an error if the command fails to create the
+ promised output files. These two lists have to be disjoint, but
+ an entry of `"outs"` may well name a location inside one of the
+ `"out_dirs"`.
+
+This function returns a map with keys the strings mentioned in
+`"outs"` and `"out_dirs"`. As values this map has artifacts defined
+to be the ones created by running the given command (in the given
+environment with the given inputs).
+
+#### `RESULT`
+
+The `RESULT` function is the only way to obtain a result value. It
+takes three (evaluated) arguments, `"artifacts"`, `"runfiles"`, and
+`"provides"`, all of which are optional and default to the empty
+map. It defines the result of a target that has the given artifacts,
+runfiles, and provided data, respectively. In particular,
+`"artifacts"` and `"runfiles"` have to be maps to artifacts, and
+`"provides"` has to be a map. Moreover, they keys in `"runfiles"`
+and `"artifacts"` are treated as paths; it is an error if this
+interpretation yields to conflicts. The keys in the artifacts or
+runfile maps as seen by other targets are the normalized paths of
+the keys given.
+
+Result values themselves are opaque in our expression language and
+cannot be deconstructed in any way. Their only purpose is to be the
+result of the evaluation of the defining expression of a target.
+
+#### `CALL_EXPRESSION`
+
+This function takes one mandatory argument `"name"` which is
+unevaluated; it has to a be a string literal. The expression
+imported by that name through the imports field is evaluated in the
+current environment restricted to the variables of that expression.
+The result of that evaluation is the result of the `CALL_EXPRESSION`
+statement.
+
+During the evaluation of an expression, rule fields can still be
+accessed through the functions `FIELD`, `DEP_ARTIFACTS`, etc. In
+particular, even an expression with no variables (that, hence, is
+always evaluated in the empty environment) can carry out non-trivial
+computations and be non-constant. The special functions `BLOB`,
+`ACTION`, and `RESULT` are also available. If inside the evaluation
+of an expression the function `CALL_EXPRESSION` is used, the name
+argument refers to the `"imports"` map of that expression. So the
+call graph is deliberately recursion free.
+
+Evaluation of a target
+----------------------
+
+A target defined by a user-defined rule is evaluated in the following
+way.
+
+ - First, the config fields are evaluated.
+
+ - Then, the target-fields are evaluated. This happens for each field
+ as follows.
+
+ - The configuration transition for this field is evaluated and the
+ transitioned configurations determined.
+ - The argument expression for this field is evaluated. The result
+ is interpreted as a list of target names. Each of those targets
+ is analyzed in all the specified configurations.
+
+ - The string fields are evaluated. If the expression for a string
+ field queries a target (via `outs` or `runfiles`), the value for
+ that target is returned in the first configuration. The rational
+ here is that such generator expressions are intended to refer to the
+ corresponding target in its "main" configuration; they are hardly
+ used anyway for fields branching their targets over many
+ configurations.
+
+ - The effective configuration for the target is determined. The target
+ effectively has used of the configuration the variables used by the
+ `arguments_config` in the rule invocation, the `config_vars` the
+ rule specified, and the parts of the configuration used by a target
+ dependent upon. For a target dependent upon, all parts it used of
+ its configuration are relevant expect for those fixed by the
+ configuration transition.
+
+ - The rule expression is evaluated and the result of that evaluation
+ is the result of the rule.
+
+Example of developing a rule
+----------------------------
+
+Let's consider step by step an example of writing a rule. Say we want
+to write a rule that programmatically patches some files.
+
+### Framework: The minimal rule
+
+Every rule has to have a defining expression evaluating to a `RESULT`.
+So the minimally correct rule is the `"null"` rule in the following
+example rule file.
+
+ { "null": {"expression": {"type": "RESULT"}}}
+
+This rule accepts no parameters, and has the empty map as artifacts,
+runfiles, and provided data. So it is not very useful.
+
+### String inputs
+
+Let's allow the target definition to have some fields. The most simple
+fields are `string_fields`; they are given by a list of strings. In the
+defining expression we can access them directly via the `FIELD`
+function. Strings can be used when defining maps, but we can also create
+artifacts from them, using the `BLOB` function. To create a map, we can
+use the `singleton_map` function. We define values step by step, using
+the `let*` construct.
+
+``` jsonc
+{ "script only":
+ { "string_fields": ["script"]
+ , "expression":
+ { "type": "let*"
+ , "bindings":
+ [ [ "script content"
+ , { "type": "join"
+ , "separator": "\n"
+ , "$1":
+ { "type": "++"
+ , "$1":
+ [["H"], {"type": "FIELD", "name": "script"}, ["w", "q", ""]]
+ }
+ }
+ ]
+ , [ "script"
+ , { "type": "singleton_map"
+ , "key": "script.ed"
+ , "value":
+ {"type": "BLOB", "data": {"type": "var", "name": "script content"}}
+ }
+ ]
+ ]
+ , "body":
+ {"type": "RESULT", "artifacts": {"type": "var", "name": "script"}}
+ }
+ }
+}
+```
+
+### Target inputs and derived artifacts
+
+Now it is time to add the input files. Source files are targets like any
+other target (and happen to contain precisely one artifact). So we add a
+target field `"srcs"` for the file to be patched. Here we have to keep
+in mind that, on the one hand, target fields accept a list of targets
+and, on the other hand, the artifacts of a target are a whole map. We
+chose to patch all the artifacts of all given `"srcs"` targets. We can
+iterate over lists with `foreach` and maps with `foreach_map`.
+
+Next, we have to keep in mind that targets may place their artifacts at
+arbitrary logical locations. For us that means that first we have to
+make a decision at which logical locations we want to place the output
+artifacts. As one thinks of patching as an in-place operation, we chose
+to logically place the outputs where the inputs have been. Of course, we
+do not modify the input files in any way; after all, we have to define a
+mathematical function computing the output artifacts, not a collection
+of side effects. With that choice of logical artifact placement, we have
+to decide what to do if two (or more) input targets place their
+artifacts at logically the same location. We could simply take a
+"latest wins" semantics (keep in mind that target fields give a list
+of targets, not a set) as provided by the `map_union` function. We chose
+to consider it a user error if targets with conflicting artifacts are
+specified. This is provided by the `disjoint_map_union` that also allows
+to specify an error message to be provided the user. Here, conflict
+means that values for the same map position are defined in a different
+way.
+
+The actual patching is done by an `ACTION`. We have the script already;
+to make things easy, we stage the input to a fixed place and also expect
+a fixed output location. Then the actual command is a simple shell
+script. The only thing we have to keep in mind is that we want useful
+output precisely if the action fails. Also note that, while we define
+our actions sequentially, they will be executed in parallel, as none of
+them depends on the output of another one of them.
+
+``` jsonc
+{ "ed patch":
+ { "string_fields": ["script"]
+ , "target_fields": ["srcs"]
+ , "expression":
+ { "type": "let*"
+ , "bindings":
+ [ [ "script content"
+ , { "type": "join"
+ , "separator": "\n"
+ , "$1":
+ { "type": "++"
+ , "$1":
+ [["H"], {"type": "FIELD", "name": "script"}, ["w", "q", ""]]
+ }
+ }
+ ]
+ , [ "script"
+ , { "type": "singleton_map"
+ , "key": "script.ed"
+ , "value":
+ {"type": "BLOB", "data": {"type": "var", "name": "script content"}}
+ }
+ ]
+ , [ "patched files per target"
+ , { "type": "foreach"
+ , "var": "src"
+ , "range": {"type": "FIELD", "name": "srcs"}
+ , "body":
+ { "type": "foreach_map"
+ , "var_key": "file_name"
+ , "var_val": "file"
+ , "range":
+ {"type": "DEP_ARTIFACTS", "dep": {"type": "var", "name": "src"}}
+ , "body":
+ { "type": "let*"
+ , "bindings":
+ [ [ "action output"
+ , { "type": "ACTION"
+ , "inputs":
+ { "type": "map_union"
+ , "$1":
+ [ {"type": "var", "name": "script"}
+ , { "type": "singleton_map"
+ , "key": "in"
+ , "value": {"type": "var", "name": "file"}
+ }
+ ]
+ }
+ , "cmd":
+ [ "/bin/sh"
+ , "-c"
+ , "cp in out && chmod 644 out && /bin/ed out < script.ed > log 2>&1 || (cat log && exit 1)"
+ ]
+ , "outs": ["out"]
+ }
+ ]
+ ]
+ , "body":
+ { "type": "singleton_map"
+ , "key": {"type": "var", "name": "file_name"}
+ , "value":
+ { "type": "lookup"
+ , "map": {"type": "var", "name": "action output"}
+ , "key": "out"
+ }
+ }
+ }
+ }
+ }
+ ]
+ , [ "artifacts"
+ , { "type": "disjoint_map_union"
+ , "msg": "srcs artifacts must not overlap"
+ , "$1":
+ { "type": "++"
+ , "$1": {"type": "var", "name": "patched files per target"}
+ }
+ }
+ ]
+ ]
+ , "body":
+ {"type": "RESULT", "artifacts": {"type": "var", "name": "artifacts"}}
+ }
+ }
+}
+```
+
+A typical invocation of that rule would be a target file like the
+following.
+
+``` jsonc
+{ "input.txt":
+ { "type": "ed patch"
+ , "script": ["%g/world/s//user/g", "%g/World/s//USER/g"]
+ , "srcs": [["FILE", null, "input.txt"]]
+ }
+}
+```
+
+As the input file has the same name as a target (in the same module), we
+use the explicit file reference in the specification of the sources.
+
+### Implicit dependencies and config transitions
+
+Say, instead of patching a file, we want to generate source files from
+some high-level description using our actively developed code generator.
+Then we have to do some additional considerations.
+
+ - First of all, every target defined by this rule not only depends on
+ the targets the user specifies. Additionally, our code generator is
+ also an implicit dependency. And as it is under active development,
+ we certainly do not want it to be taken from the ambient build
+ environment (as we did in the previous example with `ed` which,
+ however, is a pretty stable tool). So we use an `implicit` target
+ for this.
+ - Next, we notice that our code generator is used during the build. In
+ particular, we want that tool (written in some compiled language) to
+ be built for the platform we run our actions on, not the target
+ platform we build our final binaries for. Therefore, we have to use
+ a configuration transition.
+ - As our defining expression also needs the configuration transition
+ to access the artifacts of that implicit target, we better define it
+ as a reusable expression. Other rules in our rule collection might
+ also have the same task; so `["transitions", "for host"]` might be a
+ good place to define it. In fact, it can look like the expression
+ with that name in our own code base.
+
+So, the overall organization of our rule might be as follows.
+
+``` jsonc
+{ "generated code":
+ { "target_fields": ["srcs"]
+ , "implicit": {"generator": [["generators", "foogen"]]}
+ , "config_vars": ["HOST_ARCH"]
+ , "imports": {"for host": ["transitions", "for host"]}
+ , "config_transitions":
+ {"generator": [{"type": "CALL_EXPRESSION", "name": "for host"}]}
+ , "expression": ...
+ }
+}
+```
+
+### Providing information to consuming targets
+
+In the simple case of patching, the resulting file is indeed the only
+information the consumer of that target needs; in fact, the main point
+was that the resulting target could be a drop-in replacement of a source
+file. A typical rule, however, defines something like a library and a
+library is much more, than just the actual library file and the public
+headers: a library may depend on other libraries; therefore, in order to
+use it, we need
+
+ - to have the header files of dependencies available that might be
+ included by the public header files of that library,
+ - to have the libraries transitively depended upon available during
+ linking, and
+ - to know the order in which to link the dependencies (as they might
+ have dependencies among each other).
+
+In order to keep a maintainable build description, all this should be
+taken care of by simply depending on that library. We do
+*not* want the consumer of a target having to be aware of
+such transitive dependencies (e.g., when constructing the link command
+line), as it used to be the case in early build tools like `make`.
+
+It is a deliberate design choice that a target is given only by the
+result of its analysis, regardless of where it is coming from.
+Therefore, all this information needs to be part of the result of a
+target. Such kind of information is precisely, what the mentioned
+`"provides"` map is for. As a map, it can contain an arbitrary amount of
+information and the interface function `"DEP_PROVIDES"` is in such a way
+that adding more providers does not affect targets not aware of them
+(there is no function asking for all providers of a target). The keys
+and their meaning have to be agreed upon by a target and its consumers.
+As the latter, however, typically are a target of the same family
+(authored by the same group), this usually is not a problem.
+
+A typical example of computing a provided value is the `"link-args"` in
+the rules used by `just` itself. They are defined by the following
+expression.
+
+``` jsonc
+{ "type": "nub_right"
+, "$1":
+ { "type": "++"
+ , "$1":
+ [ {"type": "keys", "$1": {"type": "var", "name": "lib"}}
+ , {"type": "CALL_EXPRESSION", "name": "link-args-deps"}
+ , {"type": "var", "name": "link external", "default": []}
+ ]
+ }
+}
+```
+
+This expression
+
+ - collects the respective provider of its dependencies,
+ - adds itself in front, and
+ - deduplicates the resulting list, keeping only the right-most
+ occurrence of each entry.
+
+In this way, the invariant is kept, that the `"link-args"` from a
+topological ordering of the dependencies (in the order that a each entry
+is mentioned before its dependencies).
diff --git a/doc/concepts/rules.org b/doc/concepts/rules.org
deleted file mode 100644
index d4c61b5e..00000000
--- a/doc/concepts/rules.org
+++ /dev/null
@@ -1,551 +0,0 @@
-* User-defined Rules
-
-Targets are defined in terms of high-level concepts like "libraries",
-"binaries", etc. In order to translate these high-level definitions
-into actionable tasks, the user defines rules, explaining at a
-single point how all targets of a given type are built.
-
-** Rules files
-
-Rules are defined in rules files (by default named ~RULES~). Those
-contain a JSON object mapping rule names to their rule definition.
-For rules, the same naming scheme as for targets applies. However,
-built-in rules (always named by a single string) take precedence
-in naming; to explicitly refer to a rule defined in the current
-module, the module has to be specified, possibly by a relative
-path, e.g., ~["./", ".", "install"]~.
-
-** Basic components of a rule
-
-A rule is defined through a JSON object with various keys. The only
-mandatory key is ~"expression"~ containing the defining expression
-of the rule.
-
-*** ~"config_fields"~, ~"string_fields"~ and ~"target_fields"~
-
-These keys specify the fields that a target defined by that rule can
-have. In particular, those have to be disjoint lists of strings.
-
-For ~"config_fields"~ and ~"string_fields"~ the respective field
-has to evaluate to a list of strings, whereas ~"target_fields"~
-have to evaluate to a list of target references. Those references
-are evaluated immediately, and in the name context of the target
-they occur in.
-
-The difference between ~"config_fields"~ and ~"string_fields"~ is
-that ~"config_fields"~ are evaluated before the target fields and
-hence can be used by the rule to specify config transitions for the
-target fields. ~"string_fields"~ on the other hand are evaluated
-_after_ the target fields; hence the rule cannot use them to
-specify a configuration transition, however the target definition
-in those fields may use the ~"outs"~ and ~"runfiles"~ functions to
-have access to the names of the artifacts or runfiles of a target
-specified in one of the target fields.
-
-*** ~"implicit"~
-
-This key specifies a map of implicit dependencies. The keys of the
-map are additional target fields, the values are the fixed list
-of targets for those fields. If a short-form name of a target is
-used (e.g., only a string instead of a module-target pair), it is
-interpreted relative to the repository and module the rule is defined
-in, not the one the rule is used in. Other than this, those fields
-are evaluated the same way as target fields settable on invocation
-of the rule.
-
-*** ~"config_vars"~
-
-This is a list of strings specifying which parts of the configuration
-the rule uses. The defining expression of the rule is evaluated in an
-environment that is the configuration restricted to those variables;
-if one of those variables is not specified in the configuration
-the value in the restriction is ~null~.
-
-*** ~"config_transitions"~
-
-This key specifies a map of (some of) the target fields (whether
-declared as ~"target_fields"~ or as ~"implicit"~) to a configuration
-expression. Here, a configuration expression is any expression
-in our language. It has access to the ~"config_vars"~ and the
-~"config_fields"~ and has to evaluate to a list of maps. Each map
-specifies a transition to the current configuration by amending
-it on the domain of that map to the given value.
-
-*** ~"imports"~
-
-This specifies a map of expressions that can later be used by
-~CALL_EXPRESSION~. In this way, duplication of (rule) code can be
-avoided. For each key, we have to have a name of an expression;
-expressions are named following the same naming scheme as targets
-and rules. The names are resolved in the context of the rule.
-Expressions themselves are defined in expression files, the default
-name being ~EXPRESSIONS~.
-
-Each expression is a JSON object. The only mandatory key is
-~"expression"~ which has to be an expression in our language. It
-optionally can have a key ~"vars"~ where the value has to be a list
-of strings (and the default is the empty list). Additionally, it
-can have another optional key ~"imports"~ following the same scheme
-as the ~"imports"~ key of a rule; in the ~"imports"~ key of an
-expression, names are resolved in the context of that expression.
-It is a requirement that the ~"imports"~ graph be cycle free.
-
-*** ~"expression"~
-
-This specifies the defining expression of the rule. The value has to
-be an expression of our expression language (basically, an abstract
-syntax tree serialized as JSON). It has access to the following
-extra functions and, when evaluated, has to return a result value.
-
-**** ~FIELD~
-
-The field function takes one argument, ~name~ which has to evaluate
-to the name of a field. For string fields, the given list of strings
-is returned; for target fields, the list of abstract names for the
-given target is returned. These abstract names are opaque within
-the rule language (but meaningful when reported in error messages)
-and should only be used to be passed on to other functions that
-expect names as inputs.
-
-**** ~DEP_ARTIFACTS~ and ~DEP_RUNFILES~
-
-These functions give access to the artifacts, or runfiles, respectively,
-of one of the targets depended upon. It takes two (evaluated)
-arguments, the mandatory ~"dep"~ and the optional ~"transition"~.
-
-The argument ~"dep"~ has to evaluate to an abstract name (as can be
-obtained from the ~FIELD~ function) of some target specified in one
-of the target fields. The ~"transition"~ argument has to evaluate
-to a configuration transition (i.e., a map) and the empty transition
-is taken as default. It is an error to request a target-transition
-pair for a target that was not requested in the given transition
-through one of the target fields.
-
-**** ~DEP_PROVIDES~
-
-This function gives access to a particular entry of the provides
-map of one of the targets depended upon. The arguments ~"dep"~
-and ~"transition"~ are as for ~DEP_ARTIFACTS~; additionally, there
-is the mandatory argument ~"provider"~ which has to evaluate to a
-string. The function returns the value of the provides map of the
-target at the given provider. If the key is not in the provides
-map (or the value at that key is ~null~), the optional argument
-~"default"~ is evaluated and returned. The default for ~"default"~
-is the empty list.
-
-**** ~BLOB~
-
-The ~BLOB~ function takes a single (evaluated) argument ~data~
-which is optional and defaults to the empty string. This argument
-has to evaluate to a string. The function returns an artifact that
-is a non-executable file with the given string as content.
-
-**** ~TREE~
-
-The ~TREE~ function takes a single (evaluated) argument ~$1~ which
-has to be a map of artifacts. The result is a single tree artifact
-formed from the input map. It is an error if the map cannot be
-transformed into a tree (e.g., due to staging conflicts).
-
-**** ~ACTION~
-
-Actions are a way to define new artifacts from (zero or more) already
-defined artifacts by running a command, typically a compiler, linker,
-archiver, etc. The action function takes the following arguments.
-- ~"inputs"~ A map of artifacts. These artifacts are present when
- the command is executed; the keys of the map are the relative path
- from the working directory of the command. The command must not
- make any assumption about the location of the working directory
- in the file system (and instead should refer to files by path
- relative to the working directory). Moreover, the command must
- not modify the input files in any way. (In-place operations can
- be simulated by staging, as is shown in the example later in
- this document.)
-
- It is an additional requirement that no conflicts occur when
- interpreting the keys as paths. For example, ~"foo.txt"~ and
- ~"./foo.txt"~ are different as strings and hence legitimately
- can be assigned different values in a map. When interpreted as
- a path, however, they name the same path; so, if the ~"inputs"~
- map contains both those keys, the corresponding values have
- to be equal.
-- ~"cmd"~ The command to execute, given as ~argv~ vector, i.e.,
- a non-empty list of strings. The 0'th element of that list will
- also be the program to be executed.
-- ~"env"~ The environment in which the command should be executed,
- given as a map of strings to strings.
-- ~"outs"~ and ~"out_dirs"~ Two list of strings naming the files
- and directories, respectively, the command is expected to create.
- It is an error if the command fails to create the promised output
- files. These two lists have to be disjoint, but an entry of
- ~"outs"~ may well name a location inside one of the ~"out_dirs"~.
-
-This function returns a map with keys the strings mentioned in
-~"outs"~ and ~"out_dirs"~. As values this map has artifacts defined
-to be the ones created by running the given command (in the given
-environment with the given inputs).
-
-**** ~RESULT~
-
-The ~RESULT~ function is the only way to obtain a result value.
-It takes three (evaluated) arguments, ~"artifacts"~, ~"runfiles"~, and
-~"provides"~, all of which are optional and default to the empty map.
-It defines the result of a target that has the given artifacts,
-runfiles, and provided data, respectively. In particular, ~"artifacts"~
-and ~"runfiles"~ have to be maps to artifacts, and ~"provides"~ has
-to be a map. Moreover, they keys in ~"runfiles"~ and ~"artifacts"~
-are treated as paths; it is an error if this interpretation yields
-to conflicts. The keys in the artifacts or runfile maps as seen by
-other targets are the normalized paths of the keys given.
-
-
-Result values themselves are opaque in our expression language
-and cannot be deconstructed in any way. Their only purpose is to
-be the result of the evaluation of the defining expression of a target.
-
-**** ~CALL_EXPRESSION~
-
-This function takes one mandatory argument ~"name"~ which is
-unevaluated; it has to a be a string literal. The expression imported
-by that name through the imports field is evaluated in the current
-environment restricted to the variables of that expression. The result
-of that evaluation is the result of the ~CALL_EXPRESSION~ statement.
-
-During the evaluation of an expression, rule fields can still be
-accessed through the functions ~FIELD~, ~DEP_ARTIFACTS~, etc. In
-particular, even an expression with no variables (that, hence, is
-always evaluated in the empty environment) can carry out non-trivial
-computations and be non-constant. The special functions ~BLOB~,
-~ACTION~, and ~RESULT~ are also available. If inside the evaluation
-of an expression the function ~CALL_EXPRESSION~ is used, the name
-argument refers to the ~"imports"~ map of that expression. So the
-call graph is deliberately recursion free.
-
-** Evaluation of a target
-
-A target defined by a user-defined rule is evaluated in the
-following way.
-
-- First, the config fields are evaluated.
-
-- Then, the target-fields are evaluated. This happens for each
- field as follows.
- - The configuration transition for this field is evaluated and
- the transitioned configurations determined.
- - The argument expression for this field is evaluated. The result
- is interpreted as a list of target names. Each of those targets
- is analyzed in all the specified configurations.
-
-- The string fields are evaluated. If the expression for a string
- field queries a target (via ~outs~ or ~runfiles~), the value for
- that target is returned in the first configuration. The rational
- here is that such generator expressions are intended to refer to
- the corresponding target in its "main" configuration; they are
- hardly used anyway for fields branching their targets over many
- configurations.
-
-- The effective configuration for the target is determined. The target
- effectively has used of the configuration the variables used by
- the ~arguments_config~ in the rule invocation, the ~config_vars~
- the rule specified, and the parts of the configuration used by
- a target dependent upon. For a target dependent upon, all parts
- it used of its configuration are relevant expect for those fixed
- by the configuration transition.
-
-- The rule expression is evaluated and the result of that evaluation
- is the result of the rule.
-
-** Example of developing a rule
-
-Let's consider step by step an example of writing a rule. Say we want
-to write a rule that programmatically patches some files.
-
-*** Framework: The minimal rule
-
-Every rule has to have a defining expression evaluating
-to a ~RESULT~. So the minimally correct rule is the ~"null"~
-rule in the following example rule file.
-
-#+BEGIN_SRC
-{ "null": {"expression": {"type": "RESULT"}}}
-#+END_SRC
-
-This rule accepts no parameters, and has the empty map as artifacts,
-runfiles, and provided data. So it is not very useful.
-
-*** String inputs
-
-Let's allow the target definition to have some fields. The most
-simple fields are ~string_fields~; they are given by a list of
-strings. In the defining expression we can access them directly via
-the ~FIELD~ function. Strings can be used when defining maps, but
-we can also create artifacts from them, using the ~BLOB~ function.
-To create a map, we can use the ~singleton_map~ function. We define
-values step by step, using the ~let*~ construct.
-
-#+BEGIN_SRC
-{ "script only":
- { "string_fields": ["script"]
- , "expression":
- { "type": "let*"
- , "bindings":
- [ [ "script content"
- , { "type": "join"
- , "separator": "\n"
- , "$1":
- { "type": "++"
- , "$1":
- [["H"], {"type": "FIELD", "name": "script"}, ["w", "q", ""]]
- }
- }
- ]
- , [ "script"
- , { "type": "singleton_map"
- , "key": "script.ed"
- , "value":
- {"type": "BLOB", "data": {"type": "var", "name": "script content"}}
- }
- ]
- ]
- , "body":
- {"type": "RESULT", "artifacts": {"type": "var", "name": "script"}}
- }
- }
-}
-#+END_SRC
-
-*** Target inputs and derived artifacts
-
-Now it is time to add the input files. Source files are targets like
-any other target (and happen to contain precisely one artifact). So
-we add a target field ~"srcs"~ for the file to be patched. Here we
-have to keep in mind that, on the one hand, target fields accept a
-list of targets and, on the other hand, the artifacts of a target
-are a whole map. We chose to patch all the artifacts of all given
-~"srcs"~ targets. We can iterate over lists with ~foreach~ and maps
-with ~foreach_map~.
-
-Next, we have to keep in mind that targets may place their artifacts
-at arbitrary logical locations. For us that means that first
-we have to make a decision at which logical locations we want
-to place the output artifacts. As one thinks of patching as an
-in-place operation, we chose to logically place the outputs where
-the inputs have been. Of course, we do not modify the input files
-in any way; after all, we have to define a mathematical function
-computing the output artifacts, not a collection of side effects.
-With that choice of logical artifact placement, we have to decide
-what to do if two (or more) input targets place their artifacts at
-logically the same location. We could simply take a "latest wins"
-semantics (keep in mind that target fields give a list of targets,
-not a set) as provided by the ~map_union~ function. We chose to
-consider it a user error if targets with conflicting artifacts are
-specified. This is provided by the ~disjoint_map_union~ that also
-allows to specify an error message to be provided the user. Here,
-conflict means that values for the same map position are defined
-in a different way.
-
-The actual patching is done by an ~ACTION~. We have the script
-already; to make things easy, we stage the input to a fixed place
-and also expect a fixed output location. Then the actual command
-is a simple shell script. The only thing we have to keep in mind
-is that we want useful output precisely if the action fails. Also
-note that, while we define our actions sequentially, they will
-be executed in parallel, as none of them depends on the output of
-another one of them.
-
-#+BEGIN_SRC
-{ "ed patch":
- { "string_fields": ["script"]
- , "target_fields": ["srcs"]
- , "expression":
- { "type": "let*"
- , "bindings":
- [ [ "script content"
- , { "type": "join"
- , "separator": "\n"
- , "$1":
- { "type": "++"
- , "$1":
- [["H"], {"type": "FIELD", "name": "script"}, ["w", "q", ""]]
- }
- }
- ]
- , [ "script"
- , { "type": "singleton_map"
- , "key": "script.ed"
- , "value":
- {"type": "BLOB", "data": {"type": "var", "name": "script content"}}
- }
- ]
- , [ "patched files per target"
- , { "type": "foreach"
- , "var": "src"
- , "range": {"type": "FIELD", "name": "srcs"}
- , "body":
- { "type": "foreach_map"
- , "var_key": "file_name"
- , "var_val": "file"
- , "range":
- {"type": "DEP_ARTIFACTS", "dep": {"type": "var", "name": "src"}}
- , "body":
- { "type": "let*"
- , "bindings":
- [ [ "action output"
- , { "type": "ACTION"
- , "inputs":
- { "type": "map_union"
- , "$1":
- [ {"type": "var", "name": "script"}
- , { "type": "singleton_map"
- , "key": "in"
- , "value": {"type": "var", "name": "file"}
- }
- ]
- }
- , "cmd":
- [ "/bin/sh"
- , "-c"
- , "cp in out && chmod 644 out && /bin/ed out < script.ed > log 2>&1 || (cat log && exit 1)"
- ]
- , "outs": ["out"]
- }
- ]
- ]
- , "body":
- { "type": "singleton_map"
- , "key": {"type": "var", "name": "file_name"}
- , "value":
- { "type": "lookup"
- , "map": {"type": "var", "name": "action output"}
- , "key": "out"
- }
- }
- }
- }
- }
- ]
- , [ "artifacts"
- , { "type": "disjoint_map_union"
- , "msg": "srcs artifacts must not overlap"
- , "$1":
- { "type": "++"
- , "$1": {"type": "var", "name": "patched files per target"}
- }
- }
- ]
- ]
- , "body":
- {"type": "RESULT", "artifacts": {"type": "var", "name": "artifacts"}}
- }
- }
-}
-#+END_SRC
-
-A typical invocation of that rule would be a target file like the following.
-#+BEGIN_SRC
-{ "input.txt":
- { "type": "ed patch"
- , "script": ["%g/world/s//user/g", "%g/World/s//USER/g"]
- , "srcs": [["FILE", null, "input.txt"]]
- }
-}
-#+END_SRC
-As the input file has the same name as a target (in the same module),
-we use the explicit file reference in the specification of the sources.
-
-*** Implicit dependencies and config transitions
-
-Say, instead of patching a file, we want to generate source files
-from some high-level description using our actively developed code
-generator. Then we have to do some additional considerations.
-- First of all, every target defined by this rule not only depends
- on the targets the user specifies. Additionally, our code
- generator is also an implicit dependency. And as it is under
- active development, we certainly do not want it to be taken from
- the ambient build environment (as we did in the previous example
- with ~ed~ which, however, is a pretty stable tool). So we use an
- ~implicit~ target for this.
-- Next, we notice that our code generator is used during the
- build. In particular, we want that tool (written in some compiled
- language) to be built for the platform we run our actions on, not
- the target platform we build our final binaries for. Therefore,
- we have to use a configuration transition.
-- As our defining expression also needs the configuration transition
- to access the artifacts of that implicit target, we better define
- it as a reusable expression. Other rules in our rule collection
- might also have the same task; so ~["transitions", "for host"]~
- might be a good place to define it. In fact, it can look like
- the expression with that name in our own code base.
-
-So, the overall organization of our rule might be as follows.
-
-#+BEGIN_SRC
-{ "generated code":
- { "target_fields": ["srcs"]
- , "implicit": {"generator": [["generators", "foogen"]]}
- , "config_vars": ["HOST_ARCH"]
- , "imports": {"for host": ["transitions", "for host"]}
- , "config_transitions":
- {"generator": [{"type": "CALL_EXPRESSION", "name": "for host"}]}
- , "expression": ...
- }
-}
-#+END_SRC
-
-*** Providing information to consuming targets
-
-In the simple case of patching, the resulting file is indeed the
-only information the consumer of that target needs; in fact, the main
-point was that the resulting target could be a drop-in replacement
-of a source file. A typical rule, however, defines something like
-a library and a library is much more, than just the actual library
-file and the public headers: a library may depend on other libraries;
-therefore, in order to use it, we need
-- to have the header files of dependencies available that might be
- included by the public header files of that library,
-- to have the libraries transitively depended upon available during
- linking, and
-- to know the order in which to link the dependencies (as they
- might have dependencies among each other).
-In order to keep a maintainable build description, all this should
-be taken care of by simply depending on that library. We do _not_
-want the consumer of a target having to be aware of such transitive
-dependencies (e.g., when constructing the link command line), as
-it used to be the case in early build tools like ~make~.
-
-It is a deliberate design choice that a target is given only by
-the result of its analysis, regardless of where it is coming from.
-Therefore, all this information needs to be part of the result of
-a target. Such kind of information is precisely, what the mentioned
-~"provides"~ map is for. As a map, it can contain an arbitrary
-amount of information and the interface function ~"DEP_PROVIDES"~
-is in such a way that adding more providers does not affect targets
-not aware of them (there is no function asking for all providers
-of a target). The keys and their meaning have to be agreed upon
-by a target and its consumers. As the latter, however, typically
-are a target of the same family (authored by the same group), this
-usually is not a problem.
-
-A typical example of computing a provided value is the ~"link-args"~
-in the rules used by ~just~ itself. They are defined by the following
-expression.
-#+BEGIN_SRC
-{ "type": "nub_right"
-, "$1":
- { "type": "++"
- , "$1":
- [ {"type": "keys", "$1": {"type": "var", "name": "lib"}}
- , {"type": "CALL_EXPRESSION", "name": "link-args-deps"}
- , {"type": "var", "name": "link external", "default": []}
- ]
- }
-}
-#+END_SRC
-This expression
-- collects the respective provider of its dependencies,
-- adds itself in front, and
-- deduplicates the resulting list, keeping only the right-most
- occurrence of each entry.
-In this way, the invariant is kept, that the ~"link-args"~ from a
-topological ordering of the dependencies (in the order that a each
-entry is mentioned before its dependencies).
diff --git a/doc/concepts/target-cache.md b/doc/concepts/target-cache.md
new file mode 100644
index 00000000..0db627e1
--- /dev/null
+++ b/doc/concepts/target-cache.md
@@ -0,0 +1,231 @@
+Target-level caching
+====================
+
+`git` trees as content-fixed roots
+----------------------------------
+
+### The `"git tree"` root scheme
+
+The multi-repository configuration supports a scheme `"git tree"`. This
+scheme is given by two parameters,
+
+ - the id of the tree (as a string with the hex encoding), and
+ - an arbitrary `git` repository containing the specified tree object,
+ as well as all needed tree and blob objects reachable from that
+ tree.
+
+For example, a root could be specified as follows.
+
+``` jsonc
+["git tree", "6a1820e78f61aee6b8f3677f150f4559b6ba77a4", "/usr/local/src/justbuild.git"]
+```
+
+It should be noted that the `git` tree identifier alone already
+specifies the content of the full tree. However, `just` needs access to
+some repository containing the tree in order to know what the tree looks
+like.
+
+Nevertheless, it is an important observation that the tree identifier
+alone already specifies the content of the whole (logical) directory.
+The equality of two such directories can be established by comparing the
+two identifiers *without* the need to read any file from
+disk. Those "fixed-content" descriptions, i.e., descriptions of a
+repository root that already fully determines the content are the key to
+caching whole targets.
+
+### `KNOWN` artifacts
+
+The in-memory representation of known artifacts has an optional
+reference to a repository containing that artifact. Artifacts "known"
+from local repositories might not be known to the CAS used for the
+action execution; this additional reference allows to fill such misses
+in the CAS.
+
+Content-fixed repositories
+--------------------------
+
+### The parts of a content-fixed repository
+
+In order to meaningfully cache a target, we need to be able to
+efficiently compute the cache key. We restrict this to the case where we
+can compute the information about the repository without file-system
+access. This requires that all roots (workspace, target root, etc) be
+content fixed, as well as the bindings of the free repository names (and
+hence also all transitively reachable repositories). The call such
+repositories "content-fixed" repositories.
+
+### Canonical description of a content-fixed repository
+
+The local data of a repository consists of the following.
+
+ - The roots (for workspace, targets, rules, expressions). As the tree
+ identifier already defines the content, we leave out the path to the
+ repository containing the tree.
+ - The names of the targets, rules, and expression files.
+ - The names of the outgoing "bindings".
+
+Additionally, repositories can reach additional repositories via
+bindings. Moreover, this repository-level dependency relation is not
+necessarily cycle free. In particular, we cannot use the tree unfolding
+as canonical representation of that graph up to bisimulation, as we do
+with most other data structures. To still get a canonical
+representation, we factor out the largest bisimulation, i.e., minimize
+the respective automaton (with repositories as states, local data as
+locally observable properties, and the binding relation as edges).
+
+Finally, for each repository individually, the reachable repositories
+are renamed `"0"`, `"1"`, `"2"`, etc, following a depth-first traversal
+starting from the repository in question where outgoing edges are
+traversed in lexicographical order. The entry point is hence
+recognisable as repository `"0"`.
+
+The repository key content-identifier of the canonically formatted
+canonical serialisation of the JSON encoding of the obtain
+multi-repository configuration (with repository-free git-root
+descriptions). The serialisation itself is stored in CAS.
+
+These identifications and replacement of global names does not change
+the semantics, as our name data types are completely opaque to our
+expression language. In the `"json_encode"` expression, they're
+serialized as `null` and string representation is only generated in user
+messages not available to the language itself. Moreover, names cannot be
+compared for equality either, so their only observable properties, i.e.,
+the way `"DEP_ARTIFACTS"`, `"DEP_RUNFILES`, and `"DEP_PROVIDES"` reacts
+to them are invariant under repository bisimulation.
+
+Configuration and the `"export"` rule
+-------------------------------------
+
+Targets not only depend on the content of their repository, but also on
+their configurations. Normally, the effective part of a configuration is
+only determined after analysing the target. However, for caching, we
+need to compute the cache key directly. This property is provided by the
+built-in `"export"` rule; only `"export"` targets residing in
+content-fixed repositories will be cached. This also serves as
+indication, which targets of a repository are intended for consumption
+by other repositories.
+
+An `"export"` rule takes precisely the following arguments.
+
+ - `"target"` specifying a single target, the target to be cached. It
+ must not be tainted.
+ - `"flexible_config"` a list of strings; those specify the variables
+ of the configuration that are considered. All other parts of the
+ configuration are ignored. So the effective configuration for the
+ `"export"` target is the configuration restricted to those variables
+ (filled up with `null` if the variable was not present in the
+ original configuration).
+ - `"fixed_config"` a dict with of arbitrary JSON values (taken
+ unevaluated) with keys disjoint from the `"flexible_config"`.
+
+An `"export"` target is analyzed as follows. The configuration is
+restricted to the variables specified in the `"flexible_config"`; this
+will result in the effective configuration for the exported target. It
+is a requirement that the effective configuration contain only pure JSON
+values. The (necessarily conflict-free) union with the `"fixed_config"`
+is computed and the `"target"` is evaluated in this configuration. The
+result (artifacts, runfiles, provided information) is the result of that
+evaluation. It is a requirement that the provided information does only
+contain pure JSON values and artifacts (including tree artifacts); in
+particular, they may not contain names.
+
+Cache key
+---------
+
+We only consider `"export"` targets in content-fixed repositories for
+caching. An export target is then fully described by
+
+ - the repository key of the repository the export target resides in,
+ - the target name of the export target within that repository,
+ described as module-name pair, and
+ - the effective configuration.
+
+More precisely, the canonical description is the JSON object with those
+values for the keys `"repo_key"`, `"target_name"`, and
+`"effective_config"`, respectively. The repository key is the blob
+identifier of the canonical serialisation (including sorted keys, etc)
+of the just described piece of JSON. To allow debugging and cooperation
+with other tools, whenever a cache key is computed, it is ensured, that
+the serialisation ends up in the applicable CAS.
+
+It should be noted that the cache key can be computed
+*without* analyzing the target referred to. This is
+possible, as the configuration is pruned a priori instead of the usual
+procedure to analyse and afterwards determine the parts of the
+configuration that were relevant.
+
+Cached value
+------------
+
+The value to be cached is the result of evaluating the target, that is,
+its artifacts, runfiles, and provided data. All artifacts inside those
+data structures will be described as known artifacts.
+
+As serialisation, we will essentially use our usual JSON encoding; while
+this can be used as is for artifacts and runfiles where we know that
+they have to be a map from strings to artifacts, additional information
+will be added for the provided data. The provided data can contain
+artifacts, but also legitimately pure JSON values that coincide with our
+JSON encoding of artifacts; the same holds true for nodes and result
+values. Moreover, the tree unfolding implicit in the JSON serialisation
+can be exponentially larger than the value.
+
+Therefore, in our serialisation, we add an entry for every subexpression
+and separately add a list of which subexpressions are artifacts, nodes,
+or results. During deserialisation, we use this subexpression structure
+to deserialize every subexpression only one.
+
+Sharding of target cache
+------------------------
+
+In our target description, the execution environment is not included.
+For local execution, it is implicit anyway. As we also want to cache
+high-level targets when using remote execution, we shard the target
+cache (e.g., by using appropriate subdirectories) by the blob identifier
+of the serialisation of the description of the execution backend. Here,
+`null` stands for local execution, and for remote execution we use an
+object with keys `"remote_execution_address"` and
+`"remote_execution_properties"` filled in the obvious way. As usual, we
+add the serialisation to the CAS.
+
+`"export"` targets, strictness and the extensional projection
+-------------------------------------------------------------
+
+As opposed to the target that is exported, the corresponding export
+target, if part of a content-fixed repository, will be strict: a build
+depending on such a target can only succeed if all artifacts in the
+result of target (regardless whether direct artifacts, runfiles, or as
+part of the provided data) can be built, even if not all (or even none)
+are actually used in the build.
+
+Upon cache hit, the artifacts of an export target are the known
+artifacts corresponding to the artifacts of the exported target. While
+extensionally equal, known artifacts are defined differently, so an
+export target and the exported target are intensionally different (and
+that difference might only be visible on the second build). As
+intensional equality is used when testing for absence of conflicts in
+staging, a target and its exported version almost always conflict and
+hence should not be used together. One way to achieve this is to always
+use the export target for any target that is exported. This fits well
+together with the recommendation of only depending on export targets of
+other repositories.
+
+If a target forwards artifacts of an exported target (indirect header
+files, indirect link dependencies, etc), and is exported again, no
+additional conflicts occur; replacing by the corresponding known
+artifact is a projection: the known artifact corresponding to a known
+artifact is the artifact itself. Moreover, by the strictness property
+described earlier, if an export target has a cache hit, then so have all
+export targets it depends upon. Keep in mind that a repository can only
+be content-fixed if all its dependencies are.
+
+For this strictness-based approach to work, it is, however, a
+requirement that any artifact that is exported (typically indirectly,
+e.g., as part of a common dependency) by several targets is only used
+through the same export target. For a well-structured repository, this
+should not be a natural property anyway.
+
+The forwarding of artifacts are the reason we chose that in the
+non-cached analysis of an export target the artifacts are passed on as
+received and are not wrapped in an "add to cache" action. The latter
+choice would violate that projection property we rely upon.
diff --git a/doc/concepts/target-cache.org b/doc/concepts/target-cache.org
deleted file mode 100644
index 591a66af..00000000
--- a/doc/concepts/target-cache.org
+++ /dev/null
@@ -1,219 +0,0 @@
-* Target-level caching
-
-** ~git~ trees as content-fixed roots
-
-*** The ~"git tree"~ root scheme
-
-The multi-repository configuration supports a scheme ~"git tree"~.
-This scheme is given by two parameters,
-- the id of the tree (as a string with the hex encoding), and
-- an arbitrary ~git~ repository containing the specified tree
- object, as well as all needed tree and blob objects reachable
- from that tree.
-For example, a root could be specified as follows.
-#+BEGIN_SRC
-["git tree", "6a1820e78f61aee6b8f3677f150f4559b6ba77a4", "/usr/local/src/justbuild.git"]
-#+END_SRC
-
-It should be noted that the ~git~ tree identifier alone already
-specifies the content of the full tree. However, ~just~ needs access
-to some repository containing the tree in order to know what the
-tree looks like.
-
-Nevertheless, it is an important observation that the tree identifier
-alone already specifies the content of the whole (logical) directory.
-The equality of two such directories can be established by comparing
-the two identifiers _without_ the need to read any file from
-disk. Those "fixed-content" descriptions, i.e., descriptions of a
-repository root that already fully determines the content are the
-key to caching whole targets.
-
-*** ~KNOWN~ artifacts
-
-The in-memory representation of known artifacts has an optional
-reference to a repository containing that artifact. Artifacts
-"known" from local repositories might not be known to the CAS used
-for the action execution; this additional reference allows to fill
-such misses in the CAS.
-
-** Content-fixed repositories
-
-*** The parts of a content-fixed repository
-
-In order to meaningfully cache a target, we need to be able to
-efficiently compute the cache key. We restrict this to the case where
-we can compute the information about the repository without file-system
-access. This requires that all roots (workspace, target root, etc)
-be content fixed, as well as the bindings of the free repository
-names (and hence also all transitively reachable repositories).
-The call such repositories "content-fixed" repositories.
-
-*** Canonical description of a content-fixed repository
-
-The local data of a repository consists of the following.
-- The roots (for workspace, targets, rules, expressions). As the
- tree identifier already defines the content, we leave out the
- path to the repository containing the tree.
-- The names of the targets, rules, and expression files.
-- The names of the outgoing "bindings".
-
-Additionally, repositories can reach additional repositories via
-bindings. Moreover, this repository-level dependency relation
-is not necessarily cycle free. In particular, we cannot use the
-tree unfolding as canonical representation of that graph up to
-bisimulation, as we do with most other data structures. To still get
-a canonical representation, we factor out the largest bisimulation,
-i.e., minimize the respective automaton (with repositories as
-states, local data as locally observable properties, and the binding
-relation as edges).
-
-Finally, for each repository individually, the reachable repositories
-are renamed ~"0"~, ~"1"~, ~"2"~, etc, following a depth-first
-traversal starting from the repository in question where outgoing
-edges are traversed in lexicographical order. The entry point is
-hence recognisable as repository ~"0"~.
-
-The repository key content-identifier of the canonically formatted
-canonical serialisation of the JSON encoding of the obtain
-multi-repository configuration (with repository-free git-root
-descriptions). The serialisation itself is stored in CAS.
-
-These identifications and replacement of global names does not change
-the semantics, as our name data types are completely opaque to our
-expression language. In the ~"json_encode"~ expression, they're
-serialized as ~null~ and string representation is only generated in
-user messages not available to the language itself. Moreover, names
-cannot be compared for equality either, so their only observable
-properties, i.e., the way ~"DEP_ARTIFACTS"~, ~"DEP_RUNFILES~, and
-~"DEP_PROVIDES"~ reacts to them are invariant under repository
-bisimulation.
-
-** Configuration and the ~"export"~ rule
-
-Targets not only depend on the content of their repository, but also
-on their configurations. Normally,
-the effective part of a configuration is only determined after
-analysing the target. However, for caching, we need to compute
-the cache key directly. This property is provided by the built-in ~"export"~ rule; only ~"export"~ targets
-residing in content-fixed repositories will be cached. This also
-serves as indication, which targets of a repository are intended
-for consumption by other repositories.
-
-An ~"export"~ rule takes precisely the following arguments.
-- ~"target"~ specifying a single target, the target to be cached.
- It must not be tainted.
-- ~"flexible_config"~ a list of strings; those specify the variables
- of the configuration that are considered. All other parts of
- the configuration are ignored. So the effective configuration for
- the ~"export"~ target is the configuration restricted to those
- variables (filled up with ~null~ if the variable was not present
- in the original configuration).
-- ~"fixed_config"~ a dict with of arbitrary JSON values (taken
- unevaluated) with keys disjoint from the ~"flexible_config"~.
-
-An ~"export"~ target is analyzed as follows. The configuration is
-restricted to the variables specified in the ~"flexible_config"~;
-this will result in the effective configuration for the exported
-target. It is a requirement that the effective configuration contain
-only pure JSON values. The (necessarily conflict-free) union with
-the ~"fixed_config"~ is computed and the ~"target"~ is evaluated
-in this configuration. The result (artifacts, runfiles, provided
-information) is the result of that evaluation. It is a requirement
-that the provided information does only contain pure JSON values
-and artifacts (including tree artifacts); in particular, they may
-not contain names.
-
-** Cache key
-
-We only consider ~"export"~ targets in content-fixed repositories
-for caching. An export target is then fully described by
-- the repository key of the repository the export target resides in,
-- the target name of the export target within that repository,
- described as module-name pair, and
-- the effective configuration.
-More precisely, the canonical description is the JSON object with
-those values for the keys ~"repo_key"~, ~"target_name"~, and ~"effective_config"~,
-respectively. The repository key is the blob identifier of the
-canonical serialisation (including sorted keys, etc) of the just
-described piece of JSON. To allow debugging and cooperation with
-other tools, whenever a cache key is computed, it is ensured,
-that the serialisation ends up in the applicable CAS.
-
-It should be noted that the cache key can be computed _without_
-analyzing the target referred to. This is possible, as the
-configuration is pruned a priori instead of the usual procedure
-to analyse and afterwards determine the parts of the configuration
-that were relevant.
-
-** Cached value
-
-The value to be cached is the result of evaluating the target,
-that is, its artifacts, runfiles, and provided data. All artifacts
-inside those data structures will be described as known artifacts.
-
-As serialisation, we will essentially use our usual JSON encoding;
-while this can be used as is for artifacts and runfiles where we
-know that they have to be a map from strings to artifacts, additional
-information will be added for the provided data. The provided data
-can contain artifacts, but also legitimately pure JSON values that
-coincide with our JSON encoding of artifacts; the same holds true
-for nodes and result values. Moreover, the tree unfolding implicit
-in the JSON serialisation can be exponentially larger than the value.
-
-Therefore, in our serialisation, we add an entry for every subexpression
-and separately add a list of which subexpressions are artifacts,
-nodes, or results. During deserialisation, we use this subexpression
-structure to deserialize every subexpression only one.
-
-** Sharding of target cache
-
-In our target description, the execution environment is not included.
-For local execution, it is implicit anyway. As we also want to
-cache high-level targets when using remote execution, we shard the
-target cache (e.g., by using appropriate subdirectories) by the blob
-identifier of the serialisation of the description of the execution
-backend. Here, ~null~ stands for local execution, and for remote
-execution we use an object with keys ~"remote_execution_address"~
-and ~"remote_execution_properties"~ filled in the obvious way. As
-usual, we add the serialisation to the CAS.
-
-** ~"export"~ targets, strictness and the extensional projection
-
-As opposed to the target that is exported, the corresponding export
-target, if part of a content-fixed repository, will be strict: a
-build depending on such a target can only succeed if all artifacts
-in the result of target (regardless whether direct artifacts,
-runfiles, or as part of the provided data) can be built, even if
-not all (or even none) are actually used in the build.
-
-Upon cache hit, the artifacts of an export target are the known
-artifacts corresponding to the artifacts of the exported target.
-While extensionally equal, known artifacts are defined differently,
-so an export target and the exported target are intensionally
-different (and that difference might only be visible on the second
-build). As intensional equality is used when testing for absence
-of conflicts in staging, a target and its exported version almost
-always conflict and hence should not be used together. One way to
-achieve this is to always use the export target for any target that
-is exported. This fits well together with the recommendation of
-only depending on export targets of other repositories.
-
-If a target forwards artifacts of an exported target (indirect header
-files, indirect link dependencies, etc), and is exported again, no
-additional conflicts occur; replacing by the corresponding known
-artifact is a projection: the known artifact corresponding to a
-known artifact is the artifact itself. Moreover, by the strictness
-property described earlier, if an export target has a cache hit,
-then so have all export targets it depends upon. Keep in mind that
-a repository can only be content-fixed if all its dependencies are.
-
-For this strictness-based approach to work, it is, however, a
-requirement that any artifact that is exported (typically indirectly,
-e.g., as part of a common dependency) by several targets is only
-used through the same export target. For a well-structured repository,
-this should not be a natural property anyway.
-
-The forwarding of artifacts are the reason we chose that in the
-non-cached analysis of an export target the artifacts are passed on
-as received and are not wrapped in an "add to cache" action. The
-latter choice would violate that projection property we rely upon.