doc/future-designs/computed-roots.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156

Computed roots
==============

Status quo
----------

As of version `1.0.0`, the `just` build tool requires the repository
configuration, including all roots, to be specified ahead of time. This
has a couple of consequences.

### Flexible source views, thanks to staging

For source files, the flexibility of using them in a layout different
from how they occur in the source tree is gained through staging. If a
different view of sources is needed, instead of a source target, a
defined target can be used that rearranges the sources as desired. In
this way, also programmatic transformations of source files can be
carried out (while the result is still visible at the original
location), as is done, e.g., by the `["patch", "file"]` rule of the
`just` main repository.

### Restricted flexibility in target-definitions via globbing

When defining targets, the general principle is that the definition of
target and action graph only depends on the description (given by the
target files, the rules and expressions, and the configuration). There
is, however, a single exception to that rule: a target file may use the
`GLOB` built-in construct and in this way depend on the index of the
respective source directory. This allows, e.g., to define a separate
action for every source file and, in this way, get good incrementality
and parallelism, while still having a concise target description.

### Modularity in rules through expressions

Rules might share common tasks. For example, for both `C` binaries and
`C` libraries, the source files have to be compiled to object files. To
avoid duplication of descriptions, expressions can be called (also from
expressions themselves).

Use cases that require more flexibility
---------------------------------------

### Generated target files

Sometimes projects (or parts thereof that can form a separate logical
repository) have a simple structure. For example, there is a list of
directories and for each one there is a library, named and staged in a
systematic way. Repeating all those systematic target files seems
unnecessary work. Instead, we could store the list of directories to
consider and a small script containing the naming/staging/globbing
logic; this approach would also be more maintainable. A similar approach
could also be attractive for a directory tree with tests where, on top,
all the individual tests should be collected to test suites.

### Staging according to embedded information

For importing prebuilt libraries, it is sometimes desirable to stage
them in a way honoring the embedded `soname`. The current approach is to
provide that information out of band in the target file, so that it can
be used during analysis. Still, the information is already present in
the prebuilt binary, causing unnecessary maintenance overhead; instead,
the target file could be a function of that library which can form its
own content-fixed root (e.g., a `git tree` root), so that the computed
value is easily cacheable.

### Simplified rule definition and alternative syntax

Rules can share computation through expressions. However, the interface,
deliberately has to be explicit, including the documentation strings
that are used by `just describe`. While this allows easy and efficient
implementation of `just describe`, there is some redundancy involved, as
often fields are only there to be used by a common expression, but this
have to be documented in a redundant way (causing additional maintenance
burden).

Moreover, using JSON encoding of abstract syntax trees is an
unambiguously readable and easy to automatically process format, but
people argue that it is hard to write by hand. However, it is unlikely
to get agreement on which syntax is best to use. Now, if rule and
expression files could be generated, this argument would not be
necessary. Moreover, rules are typically versioned and infrequently
changed, so the step of generating the official syntax from the
convenient one would typically be in cache.

Proposal: Support computed roots
--------------------------------

We propose computed roots as a clean principle to add the needed (and a
lot more) flexibility for the described use cases, while ensuring that
all computations of roots are properly cacheable at high level. In this
way, we do not compromise efficient builds, as the price of the
additional flexibility, in the typical case, is just a single cache
lookup. Of course, it is up to the user to ensure that this case really
is the typical one, in the same way as it is their responsibility to
describe the targets in a way to have proper incrementality.

### New root type `"computed"`

The `just` multi-repository configuration will allow a new type of root
(besides `"file"` and `"git tree"` and variants thereof), called
`"computed"`. A `"computed"` root is given by

 - the (global) name of a repository
 - the name of a target (in `["module", "target"]` format), and
 - a configuration (as JSON object, taken literally).

It is a requirement that the specified target is an `"export"` target
and the specified repository content-fixed; `"computed"` roots are
considered content-fixed. However, the dependency structure of computed
roots must be cycle free. In other words, there must exist an ordering
of computed roots (the implicit topological order, not a declared one)
such that for each computed root, the referenced repository as well as
all repositories reachable from that one via the `"bindings"` map only
contain computed roots earlier in that order.

### Strict evaluation of roots as artifact tree

The building of required computed roots happens in topological order;
the build of the defining target of a root is, in principle (subject to
a user-defined restriction of parallelism) started as soon as all roots
in the repositories reachable via bindings are available. The root is
then considered the artifact tree of the defining target.

In particular, the evaluation is strict: all roots of reachable
repositories have to be successfully computed before the evaluation is
started, even if it later turns out that one of these roots is never
accessed in the computation of the defining target. The reason for this
strictness requirement is to ensure that the cache key for target-level
caching can be computed ahead of time (and we expect the entry to be in
target-level cache most of the time anyway).

### Intensional equality of computed roots

During a build, each computed root is evaluated only once, even if
required in several places. Two computed roots are considered equal, if
they are defined in the same way, i.e., repository name, target, and
configuration agree. The repository or layer using the computed root is
not part of the root definition.

### Computed roots available to the user

As computed roots are defined by export targets, the respective
artifacts are stored in the local CAS anyway. Additionally, the tree
that forms the root will be added to CAS as well. Moreover, an option
will be added to specify a log file that contains, in machine-readable
way, all the tree identifiers of all computed roots used in this build,
together with their definition.

### `just-mr` to support computed roots

To allow simply setting up a `just` configuration using computed roots,
`just-mr` will allow a repository type `"computed"` with the same
parameters as a computed root. These repositories can be used as roots,
like any other `just-mr` repository type. When generating the `just`
multi-repository configuration, the definition of a `"computed"`
repository is just forwarded as computed root.