Add design document on tree-overlay actions

author: Klaus Aehlig <klaus.aehlig@huawei.com> 2023-11-23 12:19:06 +0100
committer: Klaus Aehlig <klaus.aehlig@huawei.com> 2023-11-29 14:06:11 +0100
commit: 50b2871fdbc4ba43d5f7880eedc08d133e9cd3d7 (patch)
tree: 47417aa286e76cddbafe1b03e0d391ab9dccb710 /doc
parent: cf04253130030bc28866d10aa1f8fe1353643d42 (diff)
download: justbuild-50b2871fdbc4ba43d5f7880eedc08d133e9cd3d7.tar.gz
1 files changed, 114 insertions, 0 deletions
diff --git a/doc/future-designs/tree-overlay.md b/doc/future-designs/tree-overlay.md
new file mode 100644
index 00000000..a0a6506b
--- /dev/null
+++ b/doc/future-designs/tree-overlay.md
@@ -0,0 +1,114 @@
+# Tree Overlay Actions
+
+## Introduction
+
+Our build tool has tree objects as first-class citizens. Trees
+can be obtained as directory outputs of actions, as well as by an
+explicit tree constructor in a rule definition taking an arrangement
+of artifacts and constructing a tree from it. Trees are handled as
+opaque objects, which has two advantages.
+- From a technical point, this allows passing through potentially
+  large directories by simply passing on a single identifier.
+- From a user point of view, this improves maintainability, as a
+  certain target can already claim certain subtrees in its artifacts
+  or runfiles, so that staging conflicts that might arise from a
+  latter addition of artifacts are already detected now.
+
+However, there are some use cases not covered by this way of handling
+trees. E.g., when creating disk images, it might be desirable to
+add project-specific artifacts to a tree obtained as directory
+output of an action calling a foreign build system. Of course,
+there need to be some out-of-band understanding where artifacts
+can be placed without messing up the original tree, but often this
+is the case, despite this being hard to formulate in a way that
+can be verified by a build system. A similar situation might occur
+when a third-party library is built using a foreign build system
+and, in order to keep the description maintainable over updates,
+the include files are collected as a whole directory.
+
+## Proposed Changes
+
+We propose to add a new type of (in-memory) action `TREE_OVERLAY`
+that rules can use to construct new trees out of existing ones
+by overlaying the contents. For ad-hoc constructions, we also add
+a built-in rule `tree_overlay` reflecting this additional action
+constructor. The following sections describe the needed changes
+in detail.
+
+### Action graph data structure: new action of overlaying trees
+
+Currently, the action graph is given by
+- `"actions"`, describing how new artifacts can be obtained by
+  running a command in a directory given by arranging existing
+  artifacts in a specified way,
+- `"blobs"`, strings that can later be referenced as "known" artifacts
+  through their content-addressable blob identifier, and
+- `"trees"`, directory objects given by an arrangement of already
+  existing artifacts.
+
+We propose to extend that data structure by introducing a new category
+`"tree overlays"` mapping (intensional) names to their definition
+as a list of existing tree artifacts. The extensional value of such
+a tree overlay is obtained by starting with the empty tree and,
+sequentially in the given order, overlay the extensional value of
+the defining artifacts. Here, the overlay of one tree by another is
+a tree where the maximal paths are those of the second tree together
+with those of a first tree that are not in conflict with any from
+the second; the artifact at such a maximal path is the one at that
+place in the second tree if the second tree contains this maximal
+path, otherwise the artifact at this position in the first tree.
+
+We keep the design that the action graph is obtained in the analysis
+phase as the union of the graph parts of the analysis results of the
+individual targets. Therefore, the analysis result of an individual
+target will also contain (besides artifacts, runfiles, provides
+map, actions, blobs, and trees) a collection of tree overlays.
+
+### Computation of `"tree overlays"` in the presence of remote execution
+
+The evaluation of `"tree overlays"` will happen in memory in the `just`
+process. To do so, the actual tree objects have to be inspected, in
+fact downwards for all common paths. In particular, as opposed to
+the remaining operations, trees in this operation cannot be passed
+on as opaque objects by simply copying the identifier. In the case
+of remote execution that means that the respective tree objects have
+to be fetched; to avoid unnecessary traffic, only the needed tree
+objects will be fetched without the blobs or tree objects outside
+common paths, even if that means that those objects cannot be put
+into the local CAS (as that would violate the tree invariant). In
+any case, when adding the new tree objects that are part of the
+overlayed tree, we have to ensure we add them to the applicable
+CAS in topological order, in order to keep the tree invariant.
+
+### Additional function in rule definition: `TREE_OVERLAY`
+
+In the defining expressions of rules, an additional constructor
+`TREE_OVERLAY` is added that (like `ACTION`, `BLOB`, and `TREE`)
+can be used to describe parts of the action graph. This constructor
+has one argument `"deps"` which has to evaluate to a list of
+tree-conflict&mdash;free mappings of strings to artifacts, also
+called "stages". The result of this function is a single artifact,
+the tree defined to be the overlay of the trees corresponding to
+the stages.
+
+The reason we require stages to be passed to the new constructor
+rather than artifacts that happen to be trees is twofold.
+- We want to find malformed expressions already analysis time;
+  therefore, we need to ensure not only that the arguments to the
+  `"tree_overlays"` entry in the action graph are artifacts, but, in
+  fact, tree artifacts. By requiring that implicit tree constructor
+  we avoid accidental use of file outputs, as a location has to be
+  explicitly specified.
+- One the other hand, we expect that often the inputs are the
+  artifacts of a dependency, which is naturally given as a stage
+  via `DEP_ARTIFACTS`. So this form of definition is actually more
+  convenient to use.
+
+### Additional built-in function `tree_overlay`
+
+To stay consistent with the idea that any build primitive also
+has a corresponding built-in rule type, we also add an additional
+built-in rule `"tree_overlay"`. It has a single field `"deps"`
+which expects a list of targets. Both, runfiles and artifacts of
+the `"tree_overlay"` target are the tree overlays of the artifacts
+of the specified `"deps"` targets in the specified order.
author	Klaus Aehlig <klaus.aehlig@huawei.com>	2023-11-23 12:19:06 +0100
committer	Klaus Aehlig <klaus.aehlig@huawei.com>	2023-11-29 14:06:11 +0100
commit	50b2871fdbc4ba43d5f7880eedc08d133e9cd3d7 (patch)
tree	47417aa286e76cddbafe1b03e0d391ab9dccb710 /doc
parent	cf04253130030bc28866d10aa1f8fe1353643d42 (diff)
download	justbuild-50b2871fdbc4ba43d5f7880eedc08d133e9cd3d7.tar.gz