diff options
author | Klaus Aehlig <klaus.aehlig@huawei.com> | 2023-11-23 12:19:06 +0100 |
---|---|---|
committer | Klaus Aehlig <klaus.aehlig@huawei.com> | 2023-11-29 14:06:11 +0100 |
commit | 50b2871fdbc4ba43d5f7880eedc08d133e9cd3d7 (patch) | |
tree | 47417aa286e76cddbafe1b03e0d391ab9dccb710 /doc | |
parent | cf04253130030bc28866d10aa1f8fe1353643d42 (diff) | |
download | justbuild-50b2871fdbc4ba43d5f7880eedc08d133e9cd3d7.tar.gz |
Add design document on tree-overlay actions
Diffstat (limited to 'doc')
-rw-r--r-- | doc/future-designs/tree-overlay.md | 114 |
1 files changed, 114 insertions, 0 deletions
diff --git a/doc/future-designs/tree-overlay.md b/doc/future-designs/tree-overlay.md new file mode 100644 index 00000000..a0a6506b --- /dev/null +++ b/doc/future-designs/tree-overlay.md @@ -0,0 +1,114 @@ +# Tree Overlay Actions + +## Introduction + +Our build tool has tree objects as first-class citizens. Trees +can be obtained as directory outputs of actions, as well as by an +explicit tree constructor in a rule definition taking an arrangement +of artifacts and constructing a tree from it. Trees are handled as +opaque objects, which has two advantages. +- From a technical point, this allows passing through potentially + large directories by simply passing on a single identifier. +- From a user point of view, this improves maintainability, as a + certain target can already claim certain subtrees in its artifacts + or runfiles, so that staging conflicts that might arise from a + latter addition of artifacts are already detected now. + +However, there are some use cases not covered by this way of handling +trees. E.g., when creating disk images, it might be desirable to +add project-specific artifacts to a tree obtained as directory +output of an action calling a foreign build system. Of course, +there need to be some out-of-band understanding where artifacts +can be placed without messing up the original tree, but often this +is the case, despite this being hard to formulate in a way that +can be verified by a build system. A similar situation might occur +when a third-party library is built using a foreign build system +and, in order to keep the description maintainable over updates, +the include files are collected as a whole directory. + +## Proposed Changes + +We propose to add a new type of (in-memory) action `TREE_OVERLAY` +that rules can use to construct new trees out of existing ones +by overlaying the contents. For ad-hoc constructions, we also add +a built-in rule `tree_overlay` reflecting this additional action +constructor. The following sections describe the needed changes +in detail. + +### Action graph data structure: new action of overlaying trees + +Currently, the action graph is given by +- `"actions"`, describing how new artifacts can be obtained by + running a command in a directory given by arranging existing + artifacts in a specified way, +- `"blobs"`, strings that can later be referenced as "known" artifacts + through their content-addressable blob identifier, and +- `"trees"`, directory objects given by an arrangement of already + existing artifacts. + +We propose to extend that data structure by introducing a new category +`"tree overlays"` mapping (intensional) names to their definition +as a list of existing tree artifacts. The extensional value of such +a tree overlay is obtained by starting with the empty tree and, +sequentially in the given order, overlay the extensional value of +the defining artifacts. Here, the overlay of one tree by another is +a tree where the maximal paths are those of the second tree together +with those of a first tree that are not in conflict with any from +the second; the artifact at such a maximal path is the one at that +place in the second tree if the second tree contains this maximal +path, otherwise the artifact at this position in the first tree. + +We keep the design that the action graph is obtained in the analysis +phase as the union of the graph parts of the analysis results of the +individual targets. Therefore, the analysis result of an individual +target will also contain (besides artifacts, runfiles, provides +map, actions, blobs, and trees) a collection of tree overlays. + +### Computation of `"tree overlays"` in the presence of remote execution + +The evaluation of `"tree overlays"` will happen in memory in the `just` +process. To do so, the actual tree objects have to be inspected, in +fact downwards for all common paths. In particular, as opposed to +the remaining operations, trees in this operation cannot be passed +on as opaque objects by simply copying the identifier. In the case +of remote execution that means that the respective tree objects have +to be fetched; to avoid unnecessary traffic, only the needed tree +objects will be fetched without the blobs or tree objects outside +common paths, even if that means that those objects cannot be put +into the local CAS (as that would violate the tree invariant). In +any case, when adding the new tree objects that are part of the +overlayed tree, we have to ensure we add them to the applicable +CAS in topological order, in order to keep the tree invariant. + +### Additional function in rule definition: `TREE_OVERLAY` + +In the defining expressions of rules, an additional constructor +`TREE_OVERLAY` is added that (like `ACTION`, `BLOB`, and `TREE`) +can be used to describe parts of the action graph. This constructor +has one argument `"deps"` which has to evaluate to a list of +tree-conflict—free mappings of strings to artifacts, also +called "stages". The result of this function is a single artifact, +the tree defined to be the overlay of the trees corresponding to +the stages. + +The reason we require stages to be passed to the new constructor +rather than artifacts that happen to be trees is twofold. +- We want to find malformed expressions already analysis time; + therefore, we need to ensure not only that the arguments to the + `"tree_overlays"` entry in the action graph are artifacts, but, in + fact, tree artifacts. By requiring that implicit tree constructor + we avoid accidental use of file outputs, as a location has to be + explicitly specified. +- One the other hand, we expect that often the inputs are the + artifacts of a dependency, which is naturally given as a stage + via `DEP_ARTIFACTS`. So this form of definition is actually more + convenient to use. + +### Additional built-in function `tree_overlay` + +To stay consistent with the idea that any build primitive also +has a corresponding built-in rule type, we also add an additional +built-in rule `"tree_overlay"`. It has a single field `"deps"` +which expects a list of targets. Both, runfiles and artifacts of +the `"tree_overlay"` target are the tree overlays of the artifacts +of the specified `"deps"` targets in the specified order. |