doc/future-designs/computed-roots.org


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154

* Computed roots

** Status quo

As of version ~1.0.0~, the ~just~ build tool requires a the repository
configuration, including all roots, to be specified ahead of time.
This has a couple of consequences.

*** Flexible source views, thanks to staging

For source files, the flexibility of using them in a layout different
from how they occur in the source tree is gained through staging.
If a different view of sources is needed, instead of a source
target, a defined target can be used that rearranges the sources as
desired. In this way, also programmatic transformations of source
files can be carried out (while the result is still visible at the
original location), as is done, e.g., by the ~["patch", "file"]~
rule of the ~just~ main repository.

*** Restricted flexibility in target-definitions via globbing

When defining targets, the general principle is that the definition
of target and action graph only depends on the description (given by
the target files, the rules and expressions, and the configuration).
There is, however, a single exception to that rule: a target file
may use the ~GLOB~ built-in construct and in this way depend on
the index of the respective source directory. This allows, e.g.,
to define a separate action for every source file and, in this
way, get good incrementality and parallelism, while still having
a concise target description.

*** Modularity in rules through expressions

Rules might share common tasks. For example, for both ~C~ binaries
and ~C~ libraries, the source files have to be compiled to object
files. To avoid duplication of descriptions, expressions can be
called (also from expressions themselves).

** Use cases that require more flexibility

*** Generated target files

Sometimes projects (or parts thereof that can form a separate
logical repository) have a simple structure. For example, there is
a list of directories and for each one there is a library, named
and staged in a systematic way. Repeating all those systematic
target files seems unnecessary work. Instead, we could store the
list of directories to consider and a small script containing the
naming/staging/globbing logic; this approach would also be more
maintainable. A similar approach could also be attractive for a
directory tree with tests where, on top, all the individual tests
should be collected to test suites.

*** Staging according to embedded information

For importing prebuilt libraries, it is sometimes desirable to
stage them in a way honoring the embedded ~soname~. The current
approach is to provide that information out of band in the target
file, so that it can be used during analysis. Still, the information
is already present in the prebuilt binary, causing unnecessary
maintenance overhead; instead, the target file could be a function
of that library which can form its own content-fixed root (e.g., a
~git tree~ root), so that the computed value is easily cachable.

*** Simplified rule definition and alternative syntax

Rules can share computation through expressions. However, the
interface, deliberately has to be explicit, including the documentation
strings that are used by ~just describe~. While this allows easy
and efficient implementation of ~just describe~, there is some
redundancy involved, as often fields are only there to be used by
a common expression, but this have to be documented in a redundant
way (causing additional maintenance burden).

Moreover, using JSON encoding of abstract syntax trees is an
unambiguously readable and easy to automatically process format,
but people argue that it is hard to write by hand. However, it is
unlikely to get agreement on which syntax is best to use. Now, if
rule and expression files could be generated, this argument would
not be necessary. Moreover, rules are typically versioned and
unfrequently changed, so the step of generating the official syntax
from the convenient one would typically be in cache.

** Proposal: Support computed roots

We propose computed roots as a clean principle to add the needed (and
a lot more) flexibility for the described use cases, while ensuring
that all computations of roots are properly cachable at high level.
In this way, we do not compromise efficient builds, as the price of
the additional flexibility, in the typical case, is just a single
cache lookup. Of course, it is up to the user to ensure that this
case really is the typical one, in the same way as it is their
responsibility to describe the targets in a way to have proper
incrementality.

*** New root type ~"computed"~

The ~just~ multi-repository configuration will allow a new type
of root (besides ~"file"~ and ~"git tree"~ and variants thereof),
called ~"computed"~. A ~"computed"~ root is given by
- the (global) name of a repository
- the name of a target (in ~["module", "target"]~ format), and
- a configuration (as JSON object, taken literally).
It is a requirement that the specified target is an ~"export"~
target and the specified repository content-fixed; ~"computed"~ roots
are considered content-fixed. However, the dependency structure of
computed roots must be cycle free. In other words, there must exist
an ordering of computed roots (the implicit topological order, not
a declared one) such that for each computed root, the referenced
repository as well as all repositories reachable from that one
via the ~"bindings"~ map only contain computed roots earlier in
that order.

*** Strict evaluation of roots as artifact tree

The building of required computed roots happens in topological order;
the build of the defining target of a root is, in principle (subject
to a user-defined restriction of parallelism) started as soon as all
roots in the repositories reachable via bindings are available. The
root is then considered the artifact tree of the defining target.

In particular, the evaluation is strict: all roots of reachable
repositories have to be successfully computed before the evaluation
is started, even if it later turns out that one of these roots is
never accessed in the computation of the defining target. The reason
for this strictness requirement is to ensure that the cache key for
target-level caching can be computed ahead of time (and we expect
the entry to be in target-level cache most of the time anyway).

*** Intensional equality of computed roots

During a build, each computed root is evaluated only once, even
if required in several places. Two computed roots are considered
equal, if they are defined in the same way, i.e., repository name,
target, and configuration agree. The repository or layer using the
computed root is not part of the root definition.

*** Computed roots available to the user

As computed roots are defined by export targets, the respective
artifacts are stored in the local CAS anyway. Additionally, the
tree that forms the root will be added to CAS as well. Moreover,
an option will be added to specify a log file that contains, in
machine-readable way, all the tree identifiers of all computed
roots used in this build, together with their definition.

*** ~just-mr~ to support computed roots

To allow simply setting up a ~just~ configuration using computed
roots, ~just-mr~ will allow a repository type ~"computed"~ with the
same parameters as a computed root. These repositories can be used
as roots, like any other ~just-mr~ repository type. When generating
the ~just~ multi-repository configuration, the definition of a
~"computed"~ repository is just forwarded as computed root.