diff options
Diffstat (limited to 'doc/future-designs/computed-roots.md')
-rw-r--r-- | doc/future-designs/computed-roots.md | 156 |
1 files changed, 156 insertions, 0 deletions
diff --git a/doc/future-designs/computed-roots.md b/doc/future-designs/computed-roots.md new file mode 100644 index 00000000..8bbff401 --- /dev/null +++ b/doc/future-designs/computed-roots.md @@ -0,0 +1,156 @@ +Computed roots +============== + +Status quo +---------- + +As of version `1.0.0`, the `just` build tool requires a the repository +configuration, including all roots, to be specified ahead of time. This +has a couple of consequences. + +### Flexible source views, thanks to staging + +For source files, the flexibility of using them in a layout different +from how they occur in the source tree is gained through staging. If a +different view of sources is needed, instead of a source target, a +defined target can be used that rearranges the sources as desired. In +this way, also programmatic transformations of source files can be +carried out (while the result is still visible at the original +location), as is done, e.g., by the `["patch", "file"]` rule of the +`just` main repository. + +### Restricted flexibility in target-definitions via globbing + +When defining targets, the general principle is that the definition of +target and action graph only depends on the description (given by the +target files, the rules and expressions, and the configuration). There +is, however, a single exception to that rule: a target file may use the +`GLOB` built-in construct and in this way depend on the index of the +respective source directory. This allows, e.g., to define a separate +action for every source file and, in this way, get good incrementality +and parallelism, while still having a concise target description. + +### Modularity in rules through expressions + +Rules might share common tasks. For example, for both `C` binaries and +`C` libraries, the source files have to be compiled to object files. To +avoid duplication of descriptions, expressions can be called (also from +expressions themselves). + +Use cases that require more flexibility +--------------------------------------- + +### Generated target files + +Sometimes projects (or parts thereof that can form a separate logical +repository) have a simple structure. For example, there is a list of +directories and for each one there is a library, named and staged in a +systematic way. Repeating all those systematic target files seems +unnecessary work. Instead, we could store the list of directories to +consider and a small script containing the naming/staging/globbing +logic; this approach would also be more maintainable. A similar approach +could also be attractive for a directory tree with tests where, on top, +all the individual tests should be collected to test suites. + +### Staging according to embedded information + +For importing prebuilt libraries, it is sometimes desirable to stage +them in a way honoring the embedded `soname`. The current approach is to +provide that information out of band in the target file, so that it can +be used during analysis. Still, the information is already present in +the prebuilt binary, causing unnecessary maintenance overhead; instead, +the target file could be a function of that library which can form its +own content-fixed root (e.g., a `git tree` root), so that the computed +value is easily cacheable. + +### Simplified rule definition and alternative syntax + +Rules can share computation through expressions. However, the interface, +deliberately has to be explicit, including the documentation strings +that are used by `just describe`. While this allows easy and efficient +implementation of `just describe`, there is some redundancy involved, as +often fields are only there to be used by a common expression, but this +have to be documented in a redundant way (causing additional maintenance +burden). + +Moreover, using JSON encoding of abstract syntax trees is an +unambiguously readable and easy to automatically process format, but +people argue that it is hard to write by hand. However, it is unlikely +to get agreement on which syntax is best to use. Now, if rule and +expression files could be generated, this argument would not be +necessary. Moreover, rules are typically versioned and infrequently +changed, so the step of generating the official syntax from the +convenient one would typically be in cache. + +Proposal: Support computed roots +-------------------------------- + +We propose computed roots as a clean principle to add the needed (and a +lot more) flexibility for the described use cases, while ensuring that +all computations of roots are properly cacheable at high level. In this +way, we do not compromise efficient builds, as the price of the +additional flexibility, in the typical case, is just a single cache +lookup. Of course, it is up to the user to ensure that this case really +is the typical one, in the same way as it is their responsibility to +describe the targets in a way to have proper incrementality. + +### New root type `"computed"` + +The `just` multi-repository configuration will allow a new type of root +(besides `"file"` and `"git tree"` and variants thereof), called +`"computed"`. A `"computed"` root is given by + + - the (global) name of a repository + - the name of a target (in `["module", "target"]` format), and + - a configuration (as JSON object, taken literally). + +It is a requirement that the specified target is an `"export"` target +and the specified repository content-fixed; `"computed"` roots are +considered content-fixed. However, the dependency structure of computed +roots must be cycle free. In other words, there must exist an ordering +of computed roots (the implicit topological order, not a declared one) +such that for each computed root, the referenced repository as well as +all repositories reachable from that one via the `"bindings"` map only +contain computed roots earlier in that order. + +### Strict evaluation of roots as artifact tree + +The building of required computed roots happens in topological order; +the build of the defining target of a root is, in principle (subject to +a user-defined restriction of parallelism) started as soon as all roots +in the repositories reachable via bindings are available. The root is +then considered the artifact tree of the defining target. + +In particular, the evaluation is strict: all roots of reachable +repositories have to be successfully computed before the evaluation is +started, even if it later turns out that one of these roots is never +accessed in the computation of the defining target. The reason for this +strictness requirement is to ensure that the cache key for target-level +caching can be computed ahead of time (and we expect the entry to be in +target-level cache most of the time anyway). + +### Intensional equality of computed roots + +During a build, each computed root is evaluated only once, even if +required in several places. Two computed roots are considered equal, if +they are defined in the same way, i.e., repository name, target, and +configuration agree. The repository or layer using the computed root is +not part of the root definition. + +### Computed roots available to the user + +As computed roots are defined by export targets, the respective +artifacts are stored in the local CAS anyway. Additionally, the tree +that forms the root will be added to CAS as well. Moreover, an option +will be added to specify a log file that contains, in machine-readable +way, all the tree identifiers of all computed roots used in this build, +together with their definition. + +### `just-mr` to support computed roots + +To allow simply setting up a `just` configuration using computed roots, +`just-mr` will allow a repository type `"computed"` with the same +parameters as a computed root. These repositories can be used as roots, +like any other `just-mr` repository type. When generating the `just` +multi-repository configuration, the definition of a `"computed"` +repository is just forwarded as computed root. |