diff options
Diffstat (limited to 'doc/future-designs')
-rw-r--r-- | doc/future-designs/computed-roots.md | 156 | ||||
-rw-r--r-- | doc/future-designs/computed-roots.org | 154 | ||||
-rw-r--r-- | doc/future-designs/execution-properties.md | 125 | ||||
-rw-r--r-- | doc/future-designs/execution-properties.org | 119 | ||||
-rw-r--r-- | doc/future-designs/service-target-cache.md | 236 | ||||
-rw-r--r-- | doc/future-designs/service-target-cache.org | 227 | ||||
-rw-r--r-- | doc/future-designs/symlinks.md | 113 | ||||
-rw-r--r-- | doc/future-designs/symlinks.org | 108 |
8 files changed, 630 insertions, 608 deletions
diff --git a/doc/future-designs/computed-roots.md b/doc/future-designs/computed-roots.md new file mode 100644 index 00000000..8bbff401 --- /dev/null +++ b/doc/future-designs/computed-roots.md @@ -0,0 +1,156 @@ +Computed roots +============== + +Status quo +---------- + +As of version `1.0.0`, the `just` build tool requires a the repository +configuration, including all roots, to be specified ahead of time. This +has a couple of consequences. + +### Flexible source views, thanks to staging + +For source files, the flexibility of using them in a layout different +from how they occur in the source tree is gained through staging. If a +different view of sources is needed, instead of a source target, a +defined target can be used that rearranges the sources as desired. In +this way, also programmatic transformations of source files can be +carried out (while the result is still visible at the original +location), as is done, e.g., by the `["patch", "file"]` rule of the +`just` main repository. + +### Restricted flexibility in target-definitions via globbing + +When defining targets, the general principle is that the definition of +target and action graph only depends on the description (given by the +target files, the rules and expressions, and the configuration). There +is, however, a single exception to that rule: a target file may use the +`GLOB` built-in construct and in this way depend on the index of the +respective source directory. This allows, e.g., to define a separate +action for every source file and, in this way, get good incrementality +and parallelism, while still having a concise target description. + +### Modularity in rules through expressions + +Rules might share common tasks. For example, for both `C` binaries and +`C` libraries, the source files have to be compiled to object files. To +avoid duplication of descriptions, expressions can be called (also from +expressions themselves). + +Use cases that require more flexibility +--------------------------------------- + +### Generated target files + +Sometimes projects (or parts thereof that can form a separate logical +repository) have a simple structure. For example, there is a list of +directories and for each one there is a library, named and staged in a +systematic way. Repeating all those systematic target files seems +unnecessary work. Instead, we could store the list of directories to +consider and a small script containing the naming/staging/globbing +logic; this approach would also be more maintainable. A similar approach +could also be attractive for a directory tree with tests where, on top, +all the individual tests should be collected to test suites. + +### Staging according to embedded information + +For importing prebuilt libraries, it is sometimes desirable to stage +them in a way honoring the embedded `soname`. The current approach is to +provide that information out of band in the target file, so that it can +be used during analysis. Still, the information is already present in +the prebuilt binary, causing unnecessary maintenance overhead; instead, +the target file could be a function of that library which can form its +own content-fixed root (e.g., a `git tree` root), so that the computed +value is easily cacheable. + +### Simplified rule definition and alternative syntax + +Rules can share computation through expressions. However, the interface, +deliberately has to be explicit, including the documentation strings +that are used by `just describe`. While this allows easy and efficient +implementation of `just describe`, there is some redundancy involved, as +often fields are only there to be used by a common expression, but this +have to be documented in a redundant way (causing additional maintenance +burden). + +Moreover, using JSON encoding of abstract syntax trees is an +unambiguously readable and easy to automatically process format, but +people argue that it is hard to write by hand. However, it is unlikely +to get agreement on which syntax is best to use. Now, if rule and +expression files could be generated, this argument would not be +necessary. Moreover, rules are typically versioned and infrequently +changed, so the step of generating the official syntax from the +convenient one would typically be in cache. + +Proposal: Support computed roots +-------------------------------- + +We propose computed roots as a clean principle to add the needed (and a +lot more) flexibility for the described use cases, while ensuring that +all computations of roots are properly cacheable at high level. In this +way, we do not compromise efficient builds, as the price of the +additional flexibility, in the typical case, is just a single cache +lookup. Of course, it is up to the user to ensure that this case really +is the typical one, in the same way as it is their responsibility to +describe the targets in a way to have proper incrementality. + +### New root type `"computed"` + +The `just` multi-repository configuration will allow a new type of root +(besides `"file"` and `"git tree"` and variants thereof), called +`"computed"`. A `"computed"` root is given by + + - the (global) name of a repository + - the name of a target (in `["module", "target"]` format), and + - a configuration (as JSON object, taken literally). + +It is a requirement that the specified target is an `"export"` target +and the specified repository content-fixed; `"computed"` roots are +considered content-fixed. However, the dependency structure of computed +roots must be cycle free. In other words, there must exist an ordering +of computed roots (the implicit topological order, not a declared one) +such that for each computed root, the referenced repository as well as +all repositories reachable from that one via the `"bindings"` map only +contain computed roots earlier in that order. + +### Strict evaluation of roots as artifact tree + +The building of required computed roots happens in topological order; +the build of the defining target of a root is, in principle (subject to +a user-defined restriction of parallelism) started as soon as all roots +in the repositories reachable via bindings are available. The root is +then considered the artifact tree of the defining target. + +In particular, the evaluation is strict: all roots of reachable +repositories have to be successfully computed before the evaluation is +started, even if it later turns out that one of these roots is never +accessed in the computation of the defining target. The reason for this +strictness requirement is to ensure that the cache key for target-level +caching can be computed ahead of time (and we expect the entry to be in +target-level cache most of the time anyway). + +### Intensional equality of computed roots + +During a build, each computed root is evaluated only once, even if +required in several places. Two computed roots are considered equal, if +they are defined in the same way, i.e., repository name, target, and +configuration agree. The repository or layer using the computed root is +not part of the root definition. + +### Computed roots available to the user + +As computed roots are defined by export targets, the respective +artifacts are stored in the local CAS anyway. Additionally, the tree +that forms the root will be added to CAS as well. Moreover, an option +will be added to specify a log file that contains, in machine-readable +way, all the tree identifiers of all computed roots used in this build, +together with their definition. + +### `just-mr` to support computed roots + +To allow simply setting up a `just` configuration using computed roots, +`just-mr` will allow a repository type `"computed"` with the same +parameters as a computed root. These repositories can be used as roots, +like any other `just-mr` repository type. When generating the `just` +multi-repository configuration, the definition of a `"computed"` +repository is just forwarded as computed root. diff --git a/doc/future-designs/computed-roots.org b/doc/future-designs/computed-roots.org deleted file mode 100644 index a83eee67..00000000 --- a/doc/future-designs/computed-roots.org +++ /dev/null @@ -1,154 +0,0 @@ -* Computed roots - -** Status quo - -As of version ~1.0.0~, the ~just~ build tool requires a the repository -configuration, including all roots, to be specified ahead of time. -This has a couple of consequences. - -*** Flexible source views, thanks to staging - -For source files, the flexibility of using them in a layout different -from how they occur in the source tree is gained through staging. -If a different view of sources is needed, instead of a source -target, a defined target can be used that rearranges the sources as -desired. In this way, also programmatic transformations of source -files can be carried out (while the result is still visible at the -original location), as is done, e.g., by the ~["patch", "file"]~ -rule of the ~just~ main repository. - -*** Restricted flexibility in target-definitions via globbing - -When defining targets, the general principle is that the definition -of target and action graph only depends on the description (given by -the target files, the rules and expressions, and the configuration). -There is, however, a single exception to that rule: a target file -may use the ~GLOB~ built-in construct and in this way depend on -the index of the respective source directory. This allows, e.g., -to define a separate action for every source file and, in this -way, get good incrementality and parallelism, while still having -a concise target description. - -*** Modularity in rules through expressions - -Rules might share common tasks. For example, for both ~C~ binaries -and ~C~ libraries, the source files have to be compiled to object -files. To avoid duplication of descriptions, expressions can be -called (also from expressions themselves). - -** Use cases that require more flexibility - -*** Generated target files - -Sometimes projects (or parts thereof that can form a separate -logical repository) have a simple structure. For example, there is -a list of directories and for each one there is a library, named -and staged in a systematic way. Repeating all those systematic -target files seems unnecessary work. Instead, we could store the -list of directories to consider and a small script containing the -naming/staging/globbing logic; this approach would also be more -maintainable. A similar approach could also be attractive for a -directory tree with tests where, on top, all the individual tests -should be collected to test suites. - -*** Staging according to embedded information - -For importing prebuilt libraries, it is sometimes desirable to -stage them in a way honoring the embedded ~soname~. The current -approach is to provide that information out of band in the target -file, so that it can be used during analysis. Still, the information -is already present in the prebuilt binary, causing unnecessary -maintenance overhead; instead, the target file could be a function -of that library which can form its own content-fixed root (e.g., a -~git tree~ root), so that the computed value is easily cacheable. - -*** Simplified rule definition and alternative syntax - -Rules can share computation through expressions. However, the -interface, deliberately has to be explicit, including the documentation -strings that are used by ~just describe~. While this allows easy -and efficient implementation of ~just describe~, there is some -redundancy involved, as often fields are only there to be used by -a common expression, but this have to be documented in a redundant -way (causing additional maintenance burden). - -Moreover, using JSON encoding of abstract syntax trees is an -unambiguously readable and easy to automatically process format, -but people argue that it is hard to write by hand. However, it is -unlikely to get agreement on which syntax is best to use. Now, if -rule and expression files could be generated, this argument would -not be necessary. Moreover, rules are typically versioned and -infrequently changed, so the step of generating the official syntax -from the convenient one would typically be in cache. - -** Proposal: Support computed roots - -We propose computed roots as a clean principle to add the needed (and -a lot more) flexibility for the described use cases, while ensuring -that all computations of roots are properly cacheable at high level. -In this way, we do not compromise efficient builds, as the price of -the additional flexibility, in the typical case, is just a single -cache lookup. Of course, it is up to the user to ensure that this -case really is the typical one, in the same way as it is their -responsibility to describe the targets in a way to have proper -incrementality. - -*** New root type ~"computed"~ - -The ~just~ multi-repository configuration will allow a new type -of root (besides ~"file"~ and ~"git tree"~ and variants thereof), -called ~"computed"~. A ~"computed"~ root is given by -- the (global) name of a repository -- the name of a target (in ~["module", "target"]~ format), and -- a configuration (as JSON object, taken literally). -It is a requirement that the specified target is an ~"export"~ -target and the specified repository content-fixed; ~"computed"~ roots -are considered content-fixed. However, the dependency structure of -computed roots must be cycle free. In other words, there must exist -an ordering of computed roots (the implicit topological order, not -a declared one) such that for each computed root, the referenced -repository as well as all repositories reachable from that one -via the ~"bindings"~ map only contain computed roots earlier in -that order. - -*** Strict evaluation of roots as artifact tree - -The building of required computed roots happens in topological order; -the build of the defining target of a root is, in principle (subject -to a user-defined restriction of parallelism) started as soon as all -roots in the repositories reachable via bindings are available. The -root is then considered the artifact tree of the defining target. - -In particular, the evaluation is strict: all roots of reachable -repositories have to be successfully computed before the evaluation -is started, even if it later turns out that one of these roots is -never accessed in the computation of the defining target. The reason -for this strictness requirement is to ensure that the cache key for -target-level caching can be computed ahead of time (and we expect -the entry to be in target-level cache most of the time anyway). - -*** Intensional equality of computed roots - -During a build, each computed root is evaluated only once, even -if required in several places. Two computed roots are considered -equal, if they are defined in the same way, i.e., repository name, -target, and configuration agree. The repository or layer using the -computed root is not part of the root definition. - -*** Computed roots available to the user - -As computed roots are defined by export targets, the respective -artifacts are stored in the local CAS anyway. Additionally, the -tree that forms the root will be added to CAS as well. Moreover, -an option will be added to specify a log file that contains, in -machine-readable way, all the tree identifiers of all computed -roots used in this build, together with their definition. - -*** ~just-mr~ to support computed roots - -To allow simply setting up a ~just~ configuration using computed -roots, ~just-mr~ will allow a repository type ~"computed"~ with the -same parameters as a computed root. These repositories can be used -as roots, like any other ~just-mr~ repository type. When generating -the ~just~ multi-repository configuration, the definition of a -~"computed"~ repository is just forwarded as computed root. diff --git a/doc/future-designs/execution-properties.md b/doc/future-designs/execution-properties.md new file mode 100644 index 00000000..d6fc53e8 --- /dev/null +++ b/doc/future-designs/execution-properties.md @@ -0,0 +1,125 @@ +Action-controlled execution properties +====================================== + +Motivation +---------- + +### Varying execution platforms + +It is a common situation that software is developed for one platform, +but it is desirable to build on a different one. For example, the other +platform could be faster (common theme when developing for embedded +devices), cheaper, or simply available in larger quantities. The +standard solution for these kind of situations is cross compiling: the +binary is completely built on one platform, while being intended to run +on a different one. This can be achieved by constructing the compiler +invocations accordingly and is already built into our rules (at least +for `C` and `C++`). + +The situation changes, however, once testing (especially end-to-end +testing) comes into play. Here, we actually have to run the built +binary---and do so on the target architecture. Nevertheless, we still +want to offload as much as possible of the work to the other platform +and perform only the actual test execution on the target platform. This +requires a single build executing actions on two (or more) platforms. + +### Varying execution times + +#### Calls to foreign build systems + +Often, third-party dependencies that natively build with a different +build system and don't change to often (yet often enough to not +have them part of the build image) are simply put in a single +action, so that they get built only once, and then stay in cache for +everyone. This is precisely, what our `rules-cc` rules like +`["CC/foreign/make", +"library"]` and `["CC/foreign/cmake", "library"]` do. + +For those compound actions, we of course expect them to run longer +than normal actions that only consist of a single compiler or linker +invocation. Giving an absolute amount of time needed for such an +action is not reasonable, as that very much depends on the +underlying hardware. However, it is reasonable to give a number +"typical" actions this compound action corresponds to. + +#### Long-running end-to-end tests + +A similar situation where a significantly longer action is needed in +a build otherwise consisting of short actions are end-to-end tests. +Test using the final binary might have a complex set up, potentially +involving several instances running to test communication, and +require a lengthy sequence of interactions to get into the situation +that is to be tested, or to verify the absence of degrading of the +service under high load or extended usage. + +Status Quo +---------- + +Action can at the moment specify + + - the actual action, i.e., inputs, outputs, and the command vector, + - the environment variables, + - a property that the action can fail (e.g., for test actions), and + - a property that the action is not to be taken from cache (e.g., + testing for flakiness). + +No other properties can be set by the action itself. In particular, +remote-execution properties and timeout are equal for all actions of a +build. + +Proposed changes +---------------- + +### Extension of the `"ACTION"` function + +We propose to extend the `"ACTION"` function available in the rule +definition by the following attributes. All of the new attributes are +optional, and the default is taken to reflect the status quo. Hence, the +proposed changes are backwards compatible. + +#### `"execution properties"` + +This value has to evaluate to a map of strings; if not given, the +empty map is taken as default. This map is taken as a union with any +remote-execution properties specified at the invocation of the build +(if keys are defined both, for the entire build and in +`"execution properties"` of a specific action, the latter takes +precedence). + +Local execution continues to any execution properties specified. +However, with the auxiliary change to `just` described later, such +execution properties can also influence a build that is local by +default. + +#### `"timeout scaling"` + +If given, the value has to be a number greater or equal than `1.0`, +with `1.0` taken as default. The action timeout specified for this +build (the default value, or whatever is specified on the command +line) is multiplied by the given factor and taken as timeout for +this action. This applies for both, local and remote builds. + +### `just` to support dispatching based on remote-execution properties + +In simple setups, like using `just execute`, the remote execution is not +capable of dispatching to different workers based on remote-execution +properties. To nevertheless have the benefits of using different +execution environments, `just` will allow an optional configuration file +to be passed on the command line via a new option +`--endpoint-configuration`. This configuration file will contain a list +of pairs of remote-execution properties and remote-execution endpoints. +The first matching entry (i.e., the first entry where the +remote-execution property map coincides with the given map when +restricted to its domain) determines the remote-execution endpoint to be +used; if no entry matches, the default remote-execution endpoint is +used. In any case, the remote-execution properties are forwarded to the +chosen remote-execution endpoint without modification. + +When connecting a non-standard remote-execution endpoint, `just` will +ensure that the applicable CAS of that endpoint will have all the needed +artifacts for that action. It will also transfer all result artifacts +back to the CAS of the default remote-execution endpoint. + +`just serve` (once implemented) will also support this new option. As +with the default execution endpoint, there is the understanding that the +client uses the same configuration as the `just serve` endpoint. diff --git a/doc/future-designs/execution-properties.org b/doc/future-designs/execution-properties.org deleted file mode 100644 index 6e9cf9e3..00000000 --- a/doc/future-designs/execution-properties.org +++ /dev/null @@ -1,119 +0,0 @@ -* Action-controlled execution properties - -** Motivation - -*** Varying execution platforms - -It is a common situation that software is developed for one platform, -but it is desirable to build on a different one. For example, -the other platform could be faster (common theme when developing -for embedded devices), cheaper, or simply available in larger -quantities. The standard solution for these kind of situations is -cross compiling: the binary is completely built on one platform, -while being intended to run on a different one. This can be achieved -by constructing the compiler invocations accordingly and is already -built into our rules (at least for ~C~ and ~C++~). - -The situation changes, however, once testing (especially end-to-end -testing) comes into play. Here, we actually have to run the built -binary---and do so on the target architecture. Nevertheless, we -still want to offload as much as possible of the work to the other -platform and perform only the actual test execution on the target -platform. This requires a single build executing actions on two (or -more) platforms. - -*** Varying execution times - -**** Calls to foreign build systems - -Often, third-party dependencies that natively build with a different -build system and don't change to often (yet often enough to not have -them part of the build image) are simply put in a single action, so -that they get built only once, and then stay in cache for everyone. -This is precisely, what our ~rules-cc~ rules like ~["CC/foreign/make", -"library"]~ and ~["CC/foreign/cmake", "library"]~ do. - -For those compound actions, we of course expect them to run longer -than normal actions that only consist of a single compiler or -linker invocation. Giving an absolute amount of time needed for -such an action is not reasonable, as that very much depends on the -underlying hardware. However, it is reasonable to give a number -"typical" actions this compound action corresponds to. - -**** Long-running end-to-end tests - -A similar situation where a significantly longer action is needed in -a build otherwise consisting of short actions are end-to-end tests. -Test using the final binary might have a complex set up, potentially -involving several instances running to test communication, and -require a lengthy sequence of interactions to get into the situation -that is to be tested, or to verify the absence of degrading of the -service under high load or extended usage. - -** Status Quo - -Action can at the moment specify -- the actual action, i.e., inputs, outputs, and the command vector, -- the environment variables, -- a property that the action can fail (e.g., for test actions), and -- a property that the action is not to be taken from cache (e.g., - testing for flakiness). -No other properties can be set by the action itself. In particular, -remote-execution properties and timeout are equal for all actions -of a build. - -** Proposed changes - -*** Extension of the ~"ACTION"~ function - -We propose to extend the ~"ACTION"~ function available in the rule -definition by the following attributes. All of the new attributes -are optional, and the default is taken to reflect the status quo. -Hence, the proposed changes are backwards compatible. - -**** ~"execution properties"~ - -This value has to evaluate to a map of strings; if not given, the -empty map is taken as default. This map is taken as a union with -any remote-execution properties specified at the invocation of -the build (if keys are defined both, for the entire build and in -~"execution properties"~ of a specific action, the latter takes -precedence). - -Local execution continues to any execution properties specified. -However, with the auxiliary change to ~just~ described later, -such execution properties can also influence a build that is local -by default. - -**** ~"timeout scaling"~ - -If given, the value has to be a number greater or equal than ~1.0~, -with ~1.0~ taken as default. The action timeout specified for this -build (the default value, or whatever is specified on the command -line) is multiplied by the given factor and taken as timeout for -this action. This applies for both, local and remote builds. - -*** ~just~ to support dispatching based on remote-execution properties - -In simple setups, like using ~just execute~, the remote execution -is not capable of dispatching to different workers based on -remote-execution properties. To nevertheless have the benefits of -using different execution environments, ~just~ will allow an optional -configuration file to be passed on the command line via a new option -~--endpoint-configuration~. This configuration file will contain a -list of pairs of remote-execution properties and remote-execution -endpoints. The first matching entry (i.e., the first entry where -the remote-execution property map coincides with the given map when -restricted to its domain) determines the remote-execution endpoint to -be used; if no entry matches, the default remote-execution endpoint -is used. In any case, the remote-execution properties are forwarded -to the chosen remote-execution endpoint without modification. - -When connecting a non-standard remote-execution endpoint, ~just~ will -ensure that the applicable CAS of that endpoint will have all the -needed artifacts for that action. It will also transfer all result -artifacts back to the CAS of the default remote-execution endpoint. - -~just serve~ (once implemented) will also support this new option. As -with the default execution endpoint, there is the understanding that -the client uses the same configuration as the ~just serve~ endpoint. diff --git a/doc/future-designs/service-target-cache.md b/doc/future-designs/service-target-cache.md new file mode 100644 index 00000000..941115e9 --- /dev/null +++ b/doc/future-designs/service-target-cache.md @@ -0,0 +1,236 @@ +Target-level caching as a service +================================= + +Motivation +---------- + +Projects can have quite a lot of dependencies that are not part of the +build environment, but are, instead, built from source, e.g., in order +to always build against the latest snapshot. The latter is a typical +workflow in case of first-party dependencies. In the case of +`justbuild`, those first-party dependencies form a separate logical +repository that is typically content fixed (e.g., because that +dependency is versioned in a `git` repository). + +Moreover, code is typically first built (and tested) by the owning +project before being used as a dependency. Therefore, if remote +execution is used, for a first-party dependency, we expect all actions +to be in cache. As dependencies are typically updated less often than +the code being developed is changed, in most builds, the dependencies +are in target-level cache. In other words, in a remote-execution setup, +the whole code of dependencies is fetched just to walk through the +action graph a single time to get the necessary cache hits. + +Proposal: target-level caching as a service +------------------------------------------- + +To avoid these unnecessary fetches, we add a new subcommand `just +serve` that starts a service that provides the dependencies. This +typically happens by looking up a target-level cache entry. If the +entry, however, is not in cache, this also includes building the +respective `export` target using an associated remote-execution end +point. + +### Scope: eligible `export` targets + +In order to typically have requests in cache, `just serve` will refuse +to handle requests that do not refer to `export` targets in +content-fixed repositories; recall that for a repository to be content +fixed, so have to be all repositories reachable from there. + +### Communication through an associated remote-execution service + +Each `just serve` endpoint is always associated with a remote-execution +endpoint. All artifacts exchanged between client and `just serve` +endpoint are exchanged via the CAS that is part in the associated +remote-execution endpoint. This remote-execution endpoint is also used +if `just serve` has to build targets. + +The associated remote-execution endpoint can well be the same process +simultaneously acting as `just execute`. In fact, this is the default if +no remote-execution endpoint is specified. + +### Protocol + +Communication is handled via `grpc` exchanging `proto` buffers +containing the information described in the rest of this section. + +#### Main request and answer format + +A request is given by + + - the map of remote-execution properties for the designated + remote-execution endpoint; together with the knowledge on the + fixed endpoint, the `just serve` instance can compute the + target-level cache shard, and + - the identifier of the target-level cache key; it is the + client's responsibility to ensure that the referred blob (i.e., + the JSON object with appropriate values for the keys + `"repo_key"`, `"target_name"`, and `"effective_config"`) as well + as the indirectly referred repository description (the JSON + object the `"repo_key"` in the cache key refers to) are uploaded + to CAS (of the designated remote-execution endpoint) beforehand. + +The answer to that request is the identifier of the corresponding +target-level cache value (in the same format as for local +target-level caching). The `just serve` instance will ensure that +the actual value, as well as any directly or indirectly referenced +artifacts are available in the respective remote-execution CAS. +Alternatively, the answer can indicate the kind of error (unknown +root, not an export target, build failure, etc). + +#### Auxiliary request: tree of a commit + +As for `git` repositories, it is common to specify a commit in order +to fix a dependency (even though the corresponding tree identifier +would be enough). Moreover, the standard `git` protocol supports +asking for the commit of a given remote branch, but additional +overhead is needed in order to get the tree identifier. + +Therefore, in order to support clients (or, more precisely, +`just-mr` instances setting up the repository description) in +constructing an appropriate request for `just serve` without +unnecessary overhead, `just serve` will support a second kind of +request, where the client request consists of a `git` commit +identifier and the server answers with the tree identifier for that +commit if it is aware of that commit, or indicates that it is not +aware of that commit. + +#### Auxiliary request: describe + +To support `just describe` also in the cases where code is delegated +to the `just serve` endpoint, an additional request for the +`describe` information of a target can be requested; as `just +serve` only handles `export` targets, this target necessarily has to +be an export target. + +The request is given by the identifier of the target-level cache +key, again with the promise that the referred blob is available in +CAS. The answer is the identifier of a blob containing a JSON object +with the needed information, i.e., those parts of the target +description that are used by `just describe`. Alternatively, the +answer may indicate the kind of error (unknown root, not an export +target, etc). + +### Sources: local git repositories and remote trees + +A `just serve` instance takes roots from various sources, + + - the `git` repository contained in the local build root, + - additional `git` repositories, optionally specified in the + invocation, and + - as last resort, asking the CAS in the designated remote-execution + service for the specified `git` tree. + +Allowing a list of repositories to take as sources (rather than a single +one) increases the effort when having to search for a specified tree (in +case the requested `export` target is not in cache and an actual +analysis of the build has to be carried out) or specific commit (in case +a client asks for the tree of a given commit). However, it allows for +the natural workflow of keeping separate upstream repositories in +separate clones (updated in an appropriate way) without artificially +putting them in a single repository (as orphan branches). + +Supporting building against trees from CAS allows more flexibility in +defining roots that clients do not have to care about. In fact, they can +be defined in any way, as long as + + - the client is aware of the git tree identifier of the root, and + - some entity ensures the needed trees are known to the CAS. + +The auxiliary changes to `just-mr` described later in this document +provide one possible way to handle archives in this way. Moreover, this +additional flexibility will be necessary if we ever support computed +roots, i.e., roots that are the output of a `just` build. + +### Absent roots in `just` repository specification + +In order for `just` to know for which repositories to delegate the build +to the designated `just serve` endpoint, the repository configuration +for `just` can mark roots as absent; this is done by only giving the +type as `"git tree"` (or the corresponding ignore-special variant +thereof) and the tree identifier in the root specification, but no +witnessing repository. + +Any repository containing an absent root has to be content fixed, but +not all roots have to be absent (as `just` can always upload those trees +to CAS). It is an error if, outside the computations delegated to +`just serve`, a non-export target is requested from a repository +containing an absent root. Moreover, whenever there is a dependency on a +repository containing an absent root, a `just +serve` endpoint has to be specified in the invocation of `just`. + +### Auxiliary changes + +#### `just-mr` pragma `"absent"` + +For `just-mr` to know how to construct the repository description, +the description used by `just-mr` is extended. More precisely, a new +key `"absent"` is allowed in the `"pragma"` dictionary of a +repository description. If the specified value is true, `just-mr` +will generate an absent root out of this description, using all +available means to generate that root without ever having to fetch +the repository locally. In the typical case of a `git` repository, +the auxiliary `just serve` function to obtain the tree of a commit +is used. To allow this communication, `just-mr` also accepts the +arguments describing a `just serve` endpoint and forwards them as +early arguments to `just`, in the same way as it does with +`--local-build-root`. + +#### `just-mr` to inquire remote execution before fetching + +In line with the idea that fetching sources from upstream should +happen only once and not once per developer, we add remote execution +as another way of obtaining files to `just-mr`. More precisely, +`just-mr` will support the options `just` accepts to connect to the +remote CAS. When given, those will be forwarded to `just` as early +arguments (so that later `just`-only ones can override them); +moreover, when a file needed to set up a (present) root is found +neither in local CAS nor in one of the specified distdirs, `just-mr` +will first ask the remote CAS for the missing file before trying to +fetch itself from the specified URL. The rationale for this search +order is that the designated remote-execution service is typically +reachable over the network in a more reliable way than external +resources (while local resources do not require a network at all). + +#### `just-mr` to support new repository type `git tree` + +A new repository type is added to `just-mr`, called `git tree`. Such +a repository is given by + + - a `git` tree identifier, and + - a command that, when executed in an empty directory (anywhere in + the file system) will create in that directory a directory + structure containing the specified `git` tree (either top-level + or in some subdirectory). Moreover, that command does not modify + anything outside the directory it is called in; it is an error + if the specified tree is not created in this way. + +In this way, content-fixed repositories can be generated in a +generic way, e.g., using other version-control systems or +specialized artifact-fetching tools. + +Additionally, for archive-like repositories in the `just-mr` +repository specification (currently `archive` and `zip`), a `git` +tree identifier can be specified. If the tree is known to `just-mr`, +or the `"pragma"` `"absent"` is given, it will just use that tree. +Otherwise, it will fetch as usual, but error out if the obtained +tree is not the promised one after unpacking and taking the +specified subdirectory. In this way, also archives can be used as +absent roots. + +#### `just-mr fetch` to support storing in remote-execution CAS + +The `fetch` subcommand of `just-mr` will get an additional option to +support backing up the fetched information not to a local directory, +but instead to the CAS of the specified remote-execution endpoint. +This includes + + - all archives fetched, but also + - all trees computed in setting up the respective repository + description, both, from `git tree` repositories, as well as from + archives. + +In this way, `just-mr` can be used to fill the CAS from one central +point with all the information the clients need to treat all +content-fixed roots as absent. diff --git a/doc/future-designs/service-target-cache.org b/doc/future-designs/service-target-cache.org deleted file mode 100644 index 10138db5..00000000 --- a/doc/future-designs/service-target-cache.org +++ /dev/null @@ -1,227 +0,0 @@ -* Target-level caching as a service - -** Motivation - -Projects can have quite a lot of dependencies that are not part of -the build environment, but are, instead, built from source, e.g., -in order to always build against the latest snapshot. The latter -is a typical workflow in case of first-party dependencies. In the -case of ~justbuild~, those first-party dependencies form a separate -logical repository that is typically content fixed (e.g., because -that dependency is versioned in a ~git~ repository). - -Moreover, code is typically first built (and tested) by the owning -project before being used as a dependency. Therefore, if remote -execution is used, for a first-party dependency, we expect all -actions to be in cache. As dependencies are typically updated less -often than the code being developed is changed, in most builds, -the dependencies are in target-level cache. In other words, in a -remote-execution setup, the whole code of dependencies is fetched -just to walk through the action graph a single time to get the -necessary cache hits. - -** Proposal: target-level caching as a service - -To avoid these unnecessary fetches, we add a new subcommand ~just -serve~ that starts a service that provides the dependencies. This -typically happens by looking up a target-level cache entry. If the -entry, however, is not in cache, this also includes building the -respective ~export~ target using an associated remote-execution -end point. - -*** Scope: eligible ~export~ targets - -In order to typically have requests in cache, ~just serve~ will -refuse to handle requests that do not refer to ~export~ targets -in content-fixed repositories; recall that for a repository to be -content fixed, so have to be all repositories reachable from there. - -*** Communication through an associated remote-execution service - -Each ~just serve~ endpoint is always associated with a remote-execution -endpoint. All artifacts exchanged between client and ~just serve~ -endpoint are exchanged via the CAS that is part in the associated -remote-execution endpoint. This remote-execution endpoint is also -used if ~just serve~ has to build targets. - -The associated remote-execution endpoint can well be the same -process simultaneously acting as ~just execute~. In fact, this is -the default if no remote-execution endpoint is specified. - -*** Protocol - -Communication is handled via ~grpc~ exchanging ~proto~ buffers -containing the information described in the rest of this section. - -**** Main request and answer format - -A request is given by -- the map of remote-execution properties for the designated - remote-execution endpoint; together with the knowledge on the fixed - endpoint, the ~just serve~ instance can compute the target-level - cache shard, and -- the identifier of the target-level cache key; it is the client's - responsibility to ensure that the referred blob (i.e., the - JSON object with appropriate values for the keys ~"repo_key"~, - ~"target_name"~, and ~"effective_config"~) as well as the - indirectly referred repository description (the JSON object the - ~"repo_key"~ in the cache key refers to) are uploaded to CAS (of - the designated remote-execution endpoint) beforehand. - -The answer to that request is the identifier of the corresponding -target-level cache value (in the same format as for local target-level -caching). The ~just serve~ instance will ensure that the actual -value, as well as any directly or indirectly referenced artifacts -are available in the respective remote-execution CAS. Alternatively, -the answer can indicate the kind of error (unknown root, not an -export target, build failure, etc). - -**** Auxiliary request: tree of a commit - -As for ~git~ repositories, it is common to specify a commit in order -to fix a dependency (even though the corresponding tree identifier -would be enough). Moreover, the standard ~git~ protocol supports -asking for the commit of a given remote branch, but additional -overhead is needed in order to get the tree identifier. - -Therefore, in order to support clients (or, more precisely, ~just-mr~ -instances setting up the repository description) in constructing an -appropriate request for ~just serve~ without unnecessary overhead, -~just serve~ will support a second kind of request, where the -client request consists of a ~git~ commit identifier and the server -answers with the tree identifier for that commit if it is aware of -that commit, or indicates that it is not aware of that commit. - -**** Auxiliary request: describe - -To support ~just describe~ also in the cases where code is -delegated to the ~just serve~ endpoint, an additional request for -the ~describe~ information of a target can be requested; as ~just -serve~ only handles ~export~ targets, this target necessarily has -to be an export target. - -The request is given by the identifier of the target-level cache -key, again with the promise that the referred blob is available -in CAS. The answer is the identifier of a blob containing a JSON -object with the needed information, i.e., those parts of the target -description that are used by ~just describe~. Alternatively, the -answer may indicate the kind of error (unknown root, not an export -target, etc). - -*** Sources: local git repositories and remote trees - -A ~just serve~ instance takes roots from various sources, -- the ~git~ repository contained in the local build root, -- additional ~git~ repositories, optionally specified in the - invocation, and -- as last resort, asking the CAS in the designated remote-execution - service for the specified ~git~ tree. - -Allowing a list of repositories to take as sources (rather than -a single one) increases the effort when having to search for a -specified tree (in case the requested ~export~ target is not in -cache and an actual analysis of the build has to be carried out) -or specific commit (in case a client asks for the tree of a given -commit). However, it allows for the natural workflow of keeping -separate upstream repositories in separate clones (updated in an -appropriate way) without artificially putting them in a single -repository (as orphan branches). - -Supporting building against trees from CAS allows more flexibility -in defining roots that clients do not have to care about. In fact, -they can be defined in any way, as long as -- the client is aware of the git tree identifier of the root, and -- some entity ensures the needed trees are known to the CAS. -The auxiliary changes to ~just-mr~ described later in this document -provide one possible way to handle archives in this way. Moreover, -this additional flexibility will be necessary if we ever support -computed roots, i.e., roots that are the output of a ~just~ build. - -*** Absent roots in ~just~ repository specification - -In order for ~just~ to know for which repositories to delegate -the build to the designated ~just serve~ endpoint, the repository -configuration for ~just~ can mark roots as absent; this is done -by only giving the type as ~"git tree"~ (or the corresponding -ignore-special variant thereof) and the tree identifier in the root -specification, but no witnessing repository. - -Any repository containing an absent root has to be content fixed, -but not all roots have to be absent (as ~just~ can always upload -those trees to CAS). It is an error if, outside the computations -delegated to ~just serve~, a non-export target is requested from a -repository containing an absent root. Moreover, whenever there is -a dependency on a repository containing an absent root, a ~just -serve~ endpoint has to be specified in the invocation of ~just~. - -*** Auxiliary changes - -**** ~just-mr~ pragma ~"absent"~ - -For ~just-mr~ to know how to construct the repository description, -the description used by ~just-mr~ is extended. More precisely, a -new key ~"absent"~ is allowed in the ~"pragma"~ dictionary of a -repository description. If the specified value is true, ~just-mr~ -will generate an absent root out of this description, using all -available means to generate that root without ever having to fetch -the repository locally. In the typical case of a ~git~ repository, -the auxiliary ~just serve~ function to obtain the tree of a commit -is used. To allow this communication, ~just-mr~ also accepts the -arguments describing a ~just serve~ endpoint and forwards them -as early arguments to ~just~, in the same way as it does with -~--local-build-root~. - -**** ~just-mr~ to inquire remote execution before fetching - -In line with the idea that fetching sources from upstream should -happen only once and not once per developer, we add remote execution -as another way of obtaining files to ~just-mr~. More precisely, -~just-mr~ will support the options ~just~ accepts to connect to -the remote CAS. When given, those will be forwarded to ~just~ -as early arguments (so that later ~just~-only ones can override -them); moreover, when a file needed to set up a (present) root is -found neither in local CAS nor in one of the specified distdirs, -~just-mr~ will first ask the remote CAS for the missing file before -trying to fetch itself from the specified URL. The rationale for -this search order is that the designated remote-execution service -is typically reachable over the network in a more reliable way than -external resources (while local resources do not require a network -at all). - -**** ~just-mr~ to support new repository type ~git tree~ - -A new repository type is added to ~just-mr~, called ~git tree~. -Such a repository is given by -- a ~git~ tree identifier, and -- a command that, when executed in an empty directory (anywhere - in the file system) will create in that directory a directory - structure containing the specified ~git~ tree (either top-level - or in some subdirectory). Moreover, that command does not modify - anything outside the directory it is called in; it is an error - if the specified tree is not created in this way. -In this way, content-fixed repositories can be generated in a -generic way, e.g., using other version-control systems or specialized -artifact-fetching tools. - -Additionally, for archive-like repositories in the ~just-mr~ -repository specification (currently ~archive~ and ~zip~), a ~git~ -tree identifier can be specified. If the tree is known to ~just-mr~, -or the ~"pragma"~ ~"absent"~ is given, it will just use that tree. -Otherwise, it will fetch as usual, but error out if the obtained -tree is not the promised one after unpacking and taking the specified -subdirectory. In this way, also archives can be used as absent roots. - -**** ~just-mr fetch~ to support storing in remote-execution CAS - -The ~fetch~ subcommand of ~just-mr~ will get an additional option to -support backing up the fetched information not to a local directory, -but instead to the CAS of the specified remote-execution endpoint. -This includes -- all archives fetched, but also -- all trees computed in setting up the respective repository - description, both, from ~git tree~ repositories, as well as - from archives. - -In this way, ~just-mr~ can be used to fill the CAS from one central -point with all the information the clients need to treat all -content-fixed roots as absent. diff --git a/doc/future-designs/symlinks.md b/doc/future-designs/symlinks.md new file mode 100644 index 00000000..05215030 --- /dev/null +++ b/doc/future-designs/symlinks.md @@ -0,0 +1,113 @@ +Symbolic links +============== + +Background +---------- + +Besides files and directories, symbolic links are also an important +entity in the file system. Also `git` natively supports symbolic links +as entries in a tree object. Technically, a symbolic link is a string +that can be read via `readlink(2)`. However, they can also be followed +and functions to access a file, like `open(2)` do so by default. When +following a symbolic link, both, relative and absolute, names can be +used. + +Symbolic links in build systems +------------------------------- + +### Follow and reading both happen + +Compilers usually follow symlinks for all inputs. Archivers (like +`tar(1)` and package-building tools) usually read the link in order to +package the link itself, rather than the file referred to (if any). As a +generic build system, it is desirable to not have to make assumptions on +the intention of the program called (and hence the way it deals with +symlinks). This, however, has the consequence that only symbolic links +themselves can properly model symbolic links. + +### Self-containedness and location-independence of roots + +From a build-system perspective, a root should be self-contained; in +fact, the target-level caching assumes that the git tree identifier +entirely describes a `git`-tree root. For this to be true, such a root +has to be both, self contained and independent of its (assumed) location +in the file system. In particular, we can neither allow absolute +symbolic links (as they, depending on the assumed location, might point +out of the root), nor relative symbolic links that go upwards (via a +`../` reference) too far. + +### Symbolic links in actions + +Like for source roots, we understand action directories as self +contained and independent of their location in the file system. +Therefore, we have to require the same restrictions there as well, i.e., +neither absolute symbolic links nor relative symbolic links going up too +far. + +Allowing all relative symbolic links that don't point outside the +action directory, however, poses an additional layer of complications in +the definition of actions: a string might be allowed as symlink in some +places in the action directory, but not in others; in particular, we +can't tell only from the information that an artifact is a relative +symlink whether it can be safely placed at a particular location in an +action or not. Similarly for trees for which we only know that they +might contain relative symbolic links. + +### Presence of symbolic links in system source trees + +It can be desirable to use system libraries or tools as dependencies. A +typical use case, but not the only one, is packaging a tool for a +distribution. An obvious approach is to declare a system directory as a +root of a repository (providing the needed target files in a separate +root). As it turns out, however, those system directories do contain +symbolic links, e.g., shared libraries pointing to the specific version +(like `libfoo.so.3` as a symlink pointing to `libfoo.so.3.1.4`) or +detours through `/etc/alternatives`. + +Implemented stop-gap: "shopping list" for bootstrapping +--------------------------------------------------------- + +As a stop-gap measure to support building the tool itself against +pre-installed dependencies with the respective directories containing +symbolic links, or tools (like `protoc`) being symbolic links (e.g., to +the specific version), repositories can specify, in the `"copy"` +attribute of the `"local_bootstrap"` parameter, a list of files and +directories to be copied as part of the bootstrapping process to a fresh +clean directory serving as root; during this copying, symlinks are +followed. + +Proposed treatment of symbolic links +------------------------------------ + +### "Ignore-special" roots + +To allow working with source trees containing symbolic links, we extend +the existing roots by "ignore-special" versions thereof. In such a +root (regardless whether file based, or `git`-tree based), everything +not a file or a directory will be pretended to be absent. For any +compile-like tasks, the effect of symlinks can be modeled by appropriate +staging. + +As certain entries have to be ignored, source trees can only be obtained +by traversing the respective tree; in particular, the `TREE` reference +is no longer constant time on those roots, even if `git`-tree based. +Nevertheless, for `git`-tree roots, the effective tree is a function of +the `git`-tree of the root, so `git`-tree-based ignore-special roots are +content fixed and hence eligible for target-level caching. + +### Accepting non-upwards relative symlinks as first-class objects + +Finally, a restricted form of symlinks, more precisely relative +non-upwards symbolic links, will be added as first-class object. That +is, a new artifact type (besides blobs and trees) for relative +non-upwards symbolic links is added. Like any other artifact they can be +freely placed into the inputs of an action, as well as in artifacts, +runfiles, or provides map of a target. Artifacts of this new type can be +defined as + + - source-symlink reference, as well as implicitly as part of a source + tree, + - as a symlink output of an action, as well as implicitly as part of a + tree output of an action, and + - explicitly in the rule language from a string through a new + `SYMLINK` constructor function. diff --git a/doc/future-designs/symlinks.org b/doc/future-designs/symlinks.org deleted file mode 100644 index 47ca5063..00000000 --- a/doc/future-designs/symlinks.org +++ /dev/null @@ -1,108 +0,0 @@ -* Symbolic links - -** Background - -Besides files and directories, symbolic links are also an important -entity in the file system. Also ~git~ natively supports symbolic -links as entries in a tree object. Technically, a symbolic link -is a string that can be read via ~readlink(2)~. However, they can -also be followed and functions to access a file, like ~open(2)~ do -so by default. When following a symbolic link, both, relative and -absolute, names can be used. - -** Symbolic links in build systems - -*** Follow and reading both happen - -Compilers usually follow symlinks for all inputs. Archivers (like -~tar(1)~ and package-building tools) usually read the link in order -to package the link itself, rather than the file referred to (if -any). As a generic build system, it is desirable to not have to make -assumptions on the intention of the program called (and hence the -way it deals with symlinks). This, however, has the consequence that -only symbolic links themselves can properly model symbolic links. - -*** Self-containedness and location-independence of roots - -From a build-system perspective, a root should be self-contained; in -fact, the target-level caching assumes that the git tree identifier -entirely describes a ~git~-tree root. For this to be true, such a -root has to be both, self contained and independent of its (assumed) -location in the file system. In particular, we can neither allow -absolute symbolic links (as they, depending on the assumed location, -might point out of the root), nor relative symbolic links that go -upwards (via a ~../~ reference) too far. - -*** Symbolic links in actions - -Like for source roots, we understand action directories as self -contained and independent of their location in the file system. -Therefore, we have to require the same restrictions there as well, -i.e., neither absolute symbolic links nor relative symbolic links -going up too far. - -Allowing all relative symbolic links that don't point outside the -action directory, however, poses an additional layer of complications -in the definition of actions: a string might be allowed as symlink -in some places in the action directory, but not in others; in -particular, we can't tell only from the information that an artifact -is a relative symlink whether it can be safely placed at a particular -location in an action or not. Similarly for trees for which we only -know that they might contain relative symbolic links. - -*** Presence of symbolic links in system source trees - -It can be desirable to use system libraries or tools as dependencies. -A typical use case, but not the only one, is packaging a tool for a -distribution. An obvious approach is to declare a system directory -as a root of a repository (providing the needed target files in a -separate root). As it turns out, however, those system directories -do contain symbolic links, e.g., shared libraries pointing to -the specific version (like ~libfoo.so.3~ as a symlink pointing to -~libfoo.so.3.1.4~) or detours through ~/etc/alternatives~. - -** Implemented stop-gap: "shopping list" for bootstrapping - -As a stop-gap measure to support building the tool itself against -pre-installed dependencies with the respective directories containing -symbolic links, or tools (like ~protoc~) being symbolic links (e.g., -to the specific version), repositories can specify, in the ~"copy"~ -attribute of the ~"local_bootstrap"~ parameter, a list of files -and directories to be copied as part of the bootstrapping process -to a fresh clean directory serving as root; during this copying, -symlinks are followed. - -** Proposed treatment of symbolic links - -*** "Ignore-special" roots - -To allow working with source trees containing symbolic links, we -extend the existing roots by "ignore-special" versions thereof. In -such a root (regardless whether file based, or ~git~-tree based), -everything not a file or a directory will be pretended to be absent. -For any compile-like tasks, the effect of symlinks can be modeled -by appropriate staging. - -As certain entries have to be ignored, source trees can only be -obtained by traversing the respective tree; in particular, the -~TREE~ reference is no longer constant time on those roots, even -if ~git~-tree based. Nevertheless, for ~git~-tree roots, the -effective tree is a function of the ~git~-tree of the root, so -~git~-tree-based ignore-special roots are content fixed and hence -eligible for target-level caching. - -*** Accepting non-upwards relative symlinks as first-class objects - -Finally, a restricted form of symlinks, more precisely relative -non-upwards symbolic links, will be added as first-class object. -That is, a new artifact type (besides blobs and trees) for relative -non-upwards symbolic links is added. Like any other artifact they -can be freely placed into the inputs of an action, as well as in -artifacts, runfiles, or provides map of a target. Artifacts of this -new type can be defined as -- source-symlink reference, as well as implicitly as part of a - source tree, -- as a symlink output of an action, as well as implicitly as part - of a tree output of an action, and -- explicitly in the rule language from a string through a new - ~SYMLINK~ constructor function. |