From a28b427eec70f45e5464b04f93f1d4b95662cd69 Mon Sep 17 00:00:00 2001
From: Klaus Aehlig <klaus.aehlig@huawei.com>
Date: Wed, 9 Apr 2025 10:25:02 +0200
Subject: Document tree overlays as implemented concept

---
 doc/concepts/tree-overlay.md       | 119 +++++++++++++++++++++++++++++++++++
 doc/future-designs/tree-overlay.md | 126 -------------------------------------
 2 files changed, 119 insertions(+), 126 deletions(-)
 create mode 100644 doc/concepts/tree-overlay.md
 delete mode 100644 doc/future-designs/tree-overlay.md

(limited to 'doc')

diff --git a/doc/concepts/tree-overlay.md b/doc/concepts/tree-overlay.md
new file mode 100644
index 00000000..068ea5fb
--- /dev/null
+++ b/doc/concepts/tree-overlay.md
@@ -0,0 +1,119 @@
+# Tree-Overlay Actions
+
+## Use cases for tree-overlay actions
+
+Our build tool has tree objects as first-class citizens. Trees
+can be obtained as directory outputs of actions, as well as by an
+explicit tree constructor in a rule definition taking an arrangement
+of artifacts and constructing a tree from it. Trees are handled as
+opaque objects, which has two advantages.
+
+- From a technical point, this allows passing through potentially
+  large directories by simply passing on a single identifier.
+- From a user point of view, this improves maintainability, as a
+  certain target can already claim certain subtrees in its artifacts
+  or runfiles, so that staging conflicts that might arise from a
+  latter addition of artifacts are already detected now.
+
+However, there are some use cases not covered by this way of handling
+trees. E.g., when creating disk images, it might be desirable to add
+project-specific artifacts to a tree obtained as directory output
+of an action calling a foreign build system. Of course, there need
+to be some out-of-band understanding where artifacts can be placed
+without messing up the original tree, but often this is the case,
+despite this being hard to formulate in a way that can be verified
+by a build system; however, it is easy for a build system to verify
+after the fact (i.e., once the trees are computed) that certain
+trees do not conflict on any path. Such an after-the-fact check is
+often maintainable enough and still guarantees that no files needed
+for the disk image to work get lost during the build process.
+
+A similar need for overlaying trees might occur when a third-party
+library is built using a foreign build system and, in order to keep
+the description maintainable over updates, the include files are
+collected as a whole directory.
+
+The mentioned use cases can be done in `justbuild` by means of
+the in-memory actions of type `TREE_OVERLAY` that rules can use
+to construct new trees out of existing ones by overlaying the
+contents; there is also a variant `DISJOINT_TREE_OVERLAY` that
+enforces that the overlaid trees do not conflict on any path. For
+ad-hoc constructions, there are built-in rules `tree_overlay` and
+`disjoint_tree_overlay` reflecting these action constructors.
+
+## Action graph data structure: actions of overlaying trees
+
+The main components of the action graph are
+
+- `"actions"`, describing how new artifacts can be obtained by
+  running a command in a directory given by arranging existing
+  artifacts in a specified way,
+- `"blobs"`, strings that can later be referenced as "known" artifacts
+  through their content-addressable blob identifier, and
+- `"trees"`, directory objects given by an arrangement of already
+  existing artifacts.
+
+Additionally, there is the category of `"tree overlays"`
+mapping (intensional) names to their definition as a list of
+existing tree artifacts, together with a bit indicating if the build
+should be aborted if the overlaid trees conflict on any path. The
+extensional value of such a tree overlay is obtained by starting
+with the empty tree and, sequentially in the given order, overlay
+the extensional value of the defining artifacts. Here, the overlay
+of one tree by another is a tree where the maximal paths are those
+of the second tree together with those of the first tree that are
+not in conflict with any from the second; the artifact at such a
+maximal path is the one at that place in the second tree if the
+second tree contains this maximal path, otherwise the artifact at
+this position in the first tree.
+
+## Computation of `"tree overlays"` in the presence of remote execution
+
+The evaluation of `"tree overlays"` happens in memory in the `just`
+process. To do so, the actual tree objects have to be inspected, in
+fact downwards for all common paths. In particular, as opposed to
+all other operations, trees in this operation cannot be passed
+on as opaque objects by simply copying the identifier. In the case
+of remote execution that means that the respective tree objects have
+to be fetched; to avoid unnecessary traffic, only the needed tree
+objects are fetched without the blobs or tree objects outside
+common paths, even if that means that those objects cannot be put
+into the local CAS (as that would violate the tree invariant). In
+any case, when adding the new tree objects that are part of the
+overlaid tree,  `just` adds them to the applicable
+CAS in topological order, in order to keep the tree invariant.
+
+## Functions in rule definition: `TREE_OVERLAY` and `DISJOINT_TREE_OVERLAY`
+
+In the defining expressions of rules, the constructors `TREE_OVERLAY`
+and `DISJOINT_TREE_OVERLAY` can be used to describe parts of the
+action graph. These constructors have one argument `"$1"` which
+has to evaluate to a list of tree-conflict&mdash;free mappings
+of strings to artifacts, also called "stages". The result of this
+function is a single artifact, the tree defined to be the overlay
+or conflict-free overlay, respectively, of the trees corresponding
+to the stages.
+
+The reason we require stages to be passed to these constructors
+rather than artifacts that happen to be trees is twofold.
+- We want to find malformed expressions already analysis time;
+  therefore, we need to ensure not only that the arguments to the
+  `"tree_overlays"` entry in the action graph are artifacts, but, in
+  fact, tree artifacts. By requiring that implicit tree constructor
+  we avoid accidental use of file outputs, as a location has to be
+  explicitly specified.
+- One the other hand, we expect that often the inputs are the
+  artifacts of a dependency, which is naturally given as a stage
+  via `DEP_ARTIFACTS`. So this form of definition is actually more
+  convenient to use.
+
+## Built-in functions `tree_overlay` and `disjoint_tree_overlay`
+
+As for any build primitive there is also a corresponding built-in
+rule type. The rules `"tree_overlay"` and `"disjoint_tree_overlay"`
+have a single field `"deps"` which expects a list of targets.
+Both runfiles and artifacts of the `"tree_overlay"` target are the
+tree overlays or conflict-free tree overlays, respectively, of the
+artifacts of the specified `"deps"` targets in the specified order
+staged at the value of the field `"name"` which has to evaluate to
+a single string.
diff --git a/doc/future-designs/tree-overlay.md b/doc/future-designs/tree-overlay.md
deleted file mode 100644
index 014c64f1..00000000
--- a/doc/future-designs/tree-overlay.md
+++ /dev/null
@@ -1,126 +0,0 @@
-# Tree Overlay Actions
-
-## Introduction
-
-Our build tool has tree objects as first-class citizens. Trees
-can be obtained as directory outputs of actions, as well as by an
-explicit tree constructor in a rule definition taking an arrangement
-of artifacts and constructing a tree from it. Trees are handled as
-opaque objects, which has two advantages.
-- From a technical point, this allows passing through potentially
-  large directories by simply passing on a single identifier.
-- From a user point of view, this improves maintainability, as a
-  certain target can already claim certain subtrees in its artifacts
-  or runfiles, so that staging conflicts that might arise from a
-  latter addition of artifacts are already detected now.
-
-However, there are some use cases not covered by this way of handling
-trees. E.g., when creating disk images, it might be desirable to add
-project-specific artifacts to a tree obtained as directory output
-of an action calling a foreign build system. Of course, there need
-to be some out-of-band understanding where artifacts can be placed
-without messing up the original tree, but often this is the case,
-despite this being hard to formulate in a way that can be verified
-by a build system; however, it is easy for a build system to verify
-after the fact (i.e., once the trees are computed) that certain
-trees do not conflict on any path. Such an after-the-fact check is
-often maintainable enough and still guarantees that no files needed
-for the disk image to work get lost during the build process.
-
-A similar need for overlaying trees might occur when a third-party
-library is built using a foreign build system and, in order to keep
-the description maintainable over updates, the include files are
-collected as a whole directory.
-
-## Proposed Changes
-
-We propose to add a new type of (in-memory) action `TREE_OVERLAY` that
-rules can use to construct new trees out of existing ones by overlaying
-the contents; we propose to also a variant `DISJOINT_TREE_OVERLAY`
-that enforces that the overlayed trees do not conflict on any path.
-For ad-hoc constructions, we also add a built-in rule `tree_overlay`
-reflecting this additional action constructors. The following
-sections describe the needed changes in detail.
-
-### Action graph data structure: new action of overlaying trees
-
-Currently, the action graph is given by
-- `"actions"`, describing how new artifacts can be obtained by
-  running a command in a directory given by arranging existing
-  artifacts in a specified way,
-- `"blobs"`, strings that can later be referenced as "known" artifacts
-  through their content-addressable blob identifier, and
-- `"trees"`, directory objects given by an arrangement of already
-  existing artifacts.
-
-We propose to extend that data structure by introducing a new category
-`"tree overlays"` mapping (intensional) names to their definition
-as a list of existing tree artifacts, together with a bit indicating if
-the build should be aborted if the overlayed trees conflict on any path.
-The extensional value of such
-a tree overlay is obtained by starting with the empty tree and,
-sequentially in the given order, overlay the extensional value of
-the defining artifacts. Here, the overlay of one tree by another is
-a tree where the maximal paths are those of the second tree together
-with those of a first tree that are not in conflict with any from
-the second; the artifact at such a maximal path is the one at that
-place in the second tree if the second tree contains this maximal
-path, otherwise the artifact at this position in the first tree.
-
-We keep the design that the action graph is obtained in the analysis
-phase as the union of the graph parts of the analysis results of the
-individual targets. Therefore, the analysis result of an individual
-target will also contain (besides artifacts, runfiles, provides
-map, actions, blobs, and trees) a collection of tree overlays.
-
-### Computation of `"tree overlays"` in the presence of remote execution
-
-The evaluation of `"tree overlays"` will happen in memory in the `just`
-process. To do so, the actual tree objects have to be inspected, in
-fact downwards for all common paths. In particular, as opposed to
-the remaining operations, trees in this operation cannot be passed
-on as opaque objects by simply copying the identifier. In the case
-of remote execution that means that the respective tree objects have
-to be fetched; to avoid unnecessary traffic, only the needed tree
-objects will be fetched without the blobs or tree objects outside
-common paths, even if that means that those objects cannot be put
-into the local CAS (as that would violate the tree invariant). In
-any case, when adding the new tree objects that are part of the
-overlayed tree, we have to ensure we add them to the applicable
-CAS in topological order, in order to keep the tree invariant.
-
-### Additional function in rule definition: `TREE_OVERLAY`
-
-In the defining expressions of rules, two additional constructors
-`TREE_OVERLAY` and `DISJOINT_TREE_OVERLAY` are added that (like
-`ACTION`, `BLOB`, and `TREE`) can be used to describe parts of the
-action graph. These constructors have one argument `"$1"` which
-has to evaluate to a list of tree-conflict&mdash;free mappings
-of strings to artifacts, also called "stages". The result of this
-function is a single artifact, the tree defined to be the overlay
-or conflict-free overlay, respectively, of the trees corresponding
-to the stages.
-
-The reason we require stages to be passed to the new constructor
-rather than artifacts that happen to be trees is twofold.
-- We want to find malformed expressions already analysis time;
-  therefore, we need to ensure not only that the arguments to the
-  `"tree_overlays"` entry in the action graph are artifacts, but, in
-  fact, tree artifacts. By requiring that implicit tree constructor
-  we avoid accidental use of file outputs, as a location has to be
-  explicitly specified.
-- One the other hand, we expect that often the inputs are the
-  artifacts of a dependency, which is naturally given as a stage
-  via `DEP_ARTIFACTS`. So this form of definition is actually more
-  convenient to use.
-
-### Additional built-in function `tree_overlay`
-
-To stay consistent with the idea that any build primitive also has
-a corresponding built-in rule type, we also add additional built-in
-rules `"tree_overlay"` and `"disjoint_tree_overlay"`. They have a
-single field `"deps"` which expects a list of targets. Both, runfiles
-and artifacts of the `"tree_overlay"` target are the tree overlays
-or conflict-free tree overlay, respectively, of the artifacts of
-the specified `"deps"` targets in the specified order staged at the
-value of the field `"name"` which has to evaluate to a single string.
-- 
cgit v1.2.3