diff options
author | Paul Cristian Sarbu <paul.cristian.sarbu@huawei.com> | 2023-12-14 15:09:28 +0100 |
---|---|---|
committer | Paul Cristian Sarbu <paul.cristian.sarbu@huawei.com> | 2023-12-15 12:35:24 +0100 |
commit | 63492331fdeaff09ece27fb6d8719a7b40015393 (patch) | |
tree | 3e7a2ffc8bb97398f7eb9501332e57e909059111 | |
parent | 05f79111fae9865184b1a2e636f5f3e7739faac6 (diff) | |
download | justbuild-63492331fdeaff09ece27fb6d8719a7b40015393.tar.gz |
just serve design doc: Update and move to concepts
-rw-r--r-- | doc/concepts/service-target-cache.md (renamed from doc/future-designs/service-target-cache.md) | 276 |
1 files changed, 153 insertions, 123 deletions
diff --git a/doc/future-designs/service-target-cache.md b/doc/concepts/service-target-cache.md index 8dafcea2..6a64baa1 100644 --- a/doc/future-designs/service-target-cache.md +++ b/doc/concepts/service-target-cache.md @@ -21,15 +21,14 @@ are in target-level cache. In other words, in a remote-execution setup, the whole code of dependencies is fetched just to walk through the action graph a single time to get the necessary cache hits. -Proposal: target-level caching as a service -------------------------------------------- +Core concepts and implementation +-------------------------------- -To avoid these unnecessary fetches, we add a new subcommand `just -serve` that starts a service that provides the dependencies. This -typically happens by looking up a target-level cache entry. If the -entry, however, is not in cache, this also includes building the -respective `export` target using an associated remote-execution end -point. +To avoid these unnecessary fetches, we have added a new subcommand +`just serve` that start a main service that provides the dependencies. +This typically happens by looking up a target-level cache entry. +If the entry, however, is not in cache, this also includes building the +respective `export` target using an associated remote-execution endpoint. ### Scope: eligible `export` targets @@ -50,24 +49,78 @@ The associated remote-execution endpoint can well be the same process simultaneously acting as `just execute`. In fact, this is the default if no remote-execution endpoint is specified. -`just serve` will also support the `--endpoint-configuration` -option. As with the default execution endpoint, there is the -understanding that the client uses the same configuration as the -`just serve` endpoint. +### Sources: local git repositories and remote trees + +A `just serve` instance takes roots from various sources, -### Protocol + - the `git` repository contained in the local build root, + - additional `git` repositories, optionally specified in the + invocation, and + - as last resort, asking the CAS in the designated remote-execution + service for the specified `git` tree. + +Allowing a list of repositories to take as sources (rather than a single +one) increases the effort when having to search for a specified tree (e.g., +in case the requested `export` target is not in cache and an actual +analysis of the build has to be carried out) or specific commit (e.g., in +case a client asks for the tree of a given commit). However, it allows for +the natural workflow of keeping separate upstream repositories in +separate clones (updated in an appropriate way) without artificially +putting them in a single repository (as orphan branches). + +Supporting building against trees from CAS allows more flexibility in +defining roots that clients do not have to care about. In fact, they can +be defined in any way, as long as + + - the client is aware of the git tree identifier of the root, and + - some entity ensures the needed trees are known to the CAS. + +The auxiliary changes to `just-mr` described later in this document +provide one possible way to handle archives in this way. Moreover, this +additional flexibility will be necessary if we ever support computed +roots, i.e., roots that are the output of a `just` build. + +### Delegation: absent roots in `just` repository specification + +In order for `just` to know for which repositories to delegate the build +to the designated `just serve` endpoint, the repository configuration +for `just` can mark roots as _absent_; this is done by only giving the +type as `"git tree"` (or the corresponding ignore-special variant +thereof) and the tree identifier in the root specification, but no +witnessing repository. + +Any repository containing an absent root has to be content fixed, but +not all roots have to be absent (as `just` can always upload those trees +to CAS). It is an error if, outside the computations delegated to +`just serve`, a non-export target is requested from a repository +containing an absent root. Moreover, whenever there is a dependency on a +repository containing an absent root, a `just serve` endpoint has to be +specified in the invocation of `just`. + +Protocol description +-------------------- Communication is handled via `grpc` exchanging `proto` buffers containing the information described in the rest of this section. +Besides the main service of `just serve`, auxiliary requests are defined, +bundled in two other services: one allowing `just-mr` to configure +multi-repository builds in the context of `absent` roots, and the other +to perform the optional check for remote-execution endpoint consistency +between a client and the `just serve` endpoint. + +### Main service + #### Main request and answer format A request is given by - the map of remote-execution properties for the designated - remote-execution endpoint; together with the knowledge on the - fixed endpoint, the `just serve` instance can compute the - target-level cache shard, and + remote-execution endpoint, + - the identifier of the blob containing the endpoint configuration + information; together with the knowledge on the fixed endpoint, + the `just serve` instance computes the target-level cache shard, + and - the identifier of the target-level cache key; it is the client's responsibility to ensure that the referred blob (i.e., the JSON object with appropriate values for the keys @@ -78,24 +131,41 @@ A request is given by The answer to that request is the identifier of the corresponding target-level cache value (in the same format as for local -target-level caching). The `just serve` instance will ensure that +target-level caching). The `just serve` instance ensures that the actual value, as well as any directly or indirectly referenced artifacts are available in the respective remote-execution CAS. -Alternatively, the answer can indicate the kind of error (unknown +Alternatively, the answer indicates the kind of error (unknown root, not an export target, build failure, etc). #### Auxiliary request: flexible variables of an `export` target To allow `just` to compute the target-level cache key without -knowledge of an absent tree, `just serve` will also answer questions +knowledge of an absent tree, `just serve` also answers questions about the flexible variables of an `export` target. Such an `export` -target can be specified by the tree of its target-level root, and -the name of the targets file. The answer is a list of strings, -naming the flexible variables. +target is specified by the tree of its target-level root, the name +of the targets file, and the name of the target itself. The answer +is a list of strings, naming the flexible variables. + +#### Auxiliary request: rule description of an `export` target + +To support `just describe` also in the cases where code is delegated +to the `just serve` endpoint, an additional request for the +`describe` information of a target can be requested; as `just serve` +only handles `export` targets, this target necessarily has to be an +export target. + +The request again contains the tree identifier of the target-level +root, the name of the targets file, and the name of the target to +inspect. The answer is the identifier of a blob containing a JSON object +with the needed information, i.e., those parts of the target description +that are used by `just describe`. Alternatively, the answer indicates +the kind of error (unknown root, not an export target, etc). + +### Auxiliary service: source trees #### Auxiliary request: tree of a commit -As for `git` repositories, it is common to specify a commit in order +For `git` repositories it is common to specify a commit in order to fix a dependency (even though the corresponding tree identifier would be enough). Moreover, the standard `git` protocol supports asking for the commit of a given remote branch, but additional @@ -104,7 +174,7 @@ overhead is needed in order to get the tree identifier. Therefore, in order to support clients (or, more precisely, `just-mr` instances setting up the repository description) in constructing an appropriate request for `just serve` without -unnecessary overhead, `just serve` will support a second kind of +unnecessary overhead, `just serve` supports a second kind of request, where the client request consists of a `git` commit identifier and the server answers with the tree identifier for that commit if it is aware of that commit, or indicates that it is not @@ -115,147 +185,107 @@ tree in the CAS of the associated remote-execution endpoint. #### Auxiliary request: tree of an archive -Also for archives typically, the `git` blob identifier is given, rather +For archives typically the `git` blob identifier is given, rather than the tree. In order to allow `just-mr` to set up a repository description without fetching the respective archive, `just serve` -will support a similar request to, given the blob identifier of an -archive, answer with the respective tree identifier of the unpacked +supports also a request which, given the blob identifier of an +archive, answers with the respective tree identifier of the unpacked archive. Here, if `just serve` needs the archive, it can look it -up in its CAS, any of the supplied `git` repsoitories (where one +up in its CAS, any of the supplied `git` repositories (where one might be for archiving of the third-party distribution archives), -and the specified remote-execution end point. +and the specified remote-execution endpoint. The (functional!) association of archive blob identifier to tree identifier of the unpacked archive is stored in the local build root and the respective tree is fixed in the `git` repository of the local build root in the same way as `just-mr` does it. When answering such a request, that tree map is consulted first (so that -those requests as well are typically served from cache). +those requests as well can be typically served from cache). Optionally, the client can request that `just serve` back up this tree in the CAS of the associated remote-execution endpoint. -#### Auxiliary request: describe - -To support `just describe` also in the cases where code is delegated -to the `just serve` endpoint, an additional request for the -`describe` information of a target can be requested; as `just -serve` only handles `export` targets, this target necessarily has to -be an export target. - -The request is given by the identifier of the target-level cache -key, again with the promise that the referred blob is available in -CAS. The answer is the identifier of a blob containing a JSON object -with the needed information, i.e., those parts of the target -description that are used by `just describe`. Alternatively, the -answer may indicate the kind of error (unknown root, not an export -target, etc). - -#### Auxiliary request: remote-execution endpoint - -Given that all artifact exchanges between client and `just serve` rely on the -CAS of a given remote endpoint, the client might want to double check that the -remote execution endpoint it wants to use is the same that is associated -with the `just serve` instance. - -The server replies with the address (with the port number) of the associated -remote execution endpoint. - -### Sources: local git repositories and remote trees - -A `just serve` instance takes roots from various sources, - - - the `git` repository contained in the local build root, - - additional `git` repositories, optionally specified in the - invocation, and - - as last resort, asking the CAS in the designated remote-execution - service for the specified `git` tree. - -Allowing a list of repositories to take as sources (rather than a single -one) increases the effort when having to search for a specified tree (in -case the requested `export` target is not in cache and an actual -analysis of the build has to be carried out) or specific commit (in case -a client asks for the tree of a given commit). However, it allows for -the natural workflow of keeping separate upstream repositories in -separate clones (updated in an appropriate way) without artificially -putting them in a single repository (as orphan branches). +#### Auxiliary requests: known Git objects -Supporting building against trees from CAS allows more flexibility in -defining roots that clients do not have to care about. In fact, they can -be defined in any way, as long as +For `just fetch` operations typically either a blob (e.g., content of +an archive) or a tree (e.g., a root, like from a `git tree` repository) +are needed to be stored into local CAS. For these cases, two auxiliary +requests, one for blobs and one for trees, respectively, have been +provided. They check whether the `just serve` endpoint knows these Git +objects and, if yes, ensure they are uploaded to the remote CAS, from +where the client can easily then retrieve them. - - the client is aware of the git tree identifier of the root, and - - some entity ensures the needed trees are known to the CAS. +### Auxiliary service: configuration -The auxiliary changes to `just-mr` described later in this document -provide one possible way to handle archives in this way. Moreover, this -additional flexibility will be necessary if we ever support computed -roots, i.e., roots that are the output of a `just` build. +#### Auxiliary request: remote-execution endpoint -### Absent roots in `just` repository specification +Given that all artifact exchanges between client and `just serve` +rely on the CAS of a given remote endpoint, the client might want +to double check that the remote execution endpoint it wants to use +is the same that is associated with the `just serve` instance. -In order for `just` to know for which repositories to delegate the build -to the designated `just serve` endpoint, the repository configuration -for `just` can mark roots as absent; this is done by only giving the -type as `"git tree"` (or the corresponding ignore-special variant -thereof) and the tree identifier in the root specification, but no -witnessing repository. +The server replies with the address (in the usual `HOST:PORT` string +format) of the associated remote execution endpoint, if set, or an +empty string otherwise (i.e., if the serve endpoint acts also as +execution endpoint). -Any repository containing an absent root has to be content fixed, but -not all roots have to be absent (as `just` can always upload those trees -to CAS). It is an error if, outside the computations delegated to -`just serve`, a non-export target is requested from a repository -containing an absent root. Moreover, whenever there is a dependency on a -repository containing an absent root, a `just -serve` endpoint has to be specified in the invocation of `just`. +Auxiliary changes +----------------- ### Modifications to the justbuild analysis of an export target -During the analysis of an export target, querying `just serve` is exclusively -linked to the presence of at least one _absent_ root. +During the analysis of an export target, querying the `just serve` endpoint +is exclusively linked to the presence of at least one _absent_ root. -The first time that we need to query `just serve` we have to verify that its -remote endpoint coincides with the one given to just, otherwise we error out. +The first time that we need to query `just serve` we verify that its remote +endpoint coincides with the one given to `just`. -If the _target root_ is marked as absent: - - we query `just serve` for retrieving the flexible configuration variables - (`ServeTargetVariables`) needed to compute the `TargetCacheKey`. If `just - serve` cannot answer, we break the analysis and inform the user with a proper - error message. +If the _target root_ for this export target is marked as absent: + - We query the `just serve` for retrieving the flexible configuration + variables needed to compute the target cache key. If `just serve` cannot + answer, we break the analysis and inform the user with a proper error + message. - - once we know the flexible configuration variables, we compute the - `TargetCacheKey`. If it is not in the local target cache, we query `just - serve` to get the target cache value for the given key. If it is not able to - provide the target cache value, we error out. + - With the served flexible configuration variables we compute the target + cache key, as all other required information for this in available + locally. If the cache entry is not in the local target cache, we query + `just serve` to provide the associated target cache value. If it is not + able to provide the target cache value, analysis fails and we error out. -### Auxiliary changes +It has to be noted that, in the case the `just serve` endpoint also does +not have the target cache entry in its own target cache, a build of the +content-fixed target is dispatched to the associated remote-execution +endpoint, which will thus increase the time spent in the analysis phase, +as experienced by the user. In order to provide a better user experience, +the work done by the `just serve` endpoint is also being reported to the +end user, similarly to the reporting done for a locally-triggered build. #### `just-mr` pragma `"absent"` -For `just-mr` to know how to construct the repository description, -the description used by `just-mr` is extended. More precisely, a new +For `just-mr` to know how to construct the multi-repository description, +the description used by `just-mr` was extended. More precisely, a new key `"absent"` is allowed in the `"pragma"` dictionary of a repository description. If the specified value is true, `just-mr` -will generate an absent root out of this description, using all +generates an absent root out of this description, using all available means to generate that root without ever having to fetch -the repository locally. In the typical case of a `git` repository, -the auxiliary `just serve` function to obtain the tree of a commit -is used. To allow this communication, `just-mr` also accepts the +the repository locally. For example, in the typical case of a `git` +repository the auxiliary `just serve` function to obtain the tree of a +commit is used. To allow this communication, `just-mr` also accepts arguments describing a `just serve` endpoint and forwards them as -early arguments to `just`, in the same way as it does with +early arguments to `just`, in the same way as it does, e.g., with `--local-build-root`. #### `just-mr` to inquire remote execution before fetching In line with the idea that fetching sources from upstream should -happen only once and not once per developer, we add remote execution -as another way of obtaining files to `just-mr`. More precisely, -`just-mr` will support the options `just` accepts to connect to the -remote CAS. When given, those will be forwarded to `just` as early +happen only once and not once per developer, we have added remote +execution as another way of obtaining files to `just-mr`. More precisely, +`just-mr` now supports the options `just` accepts to connect to the +remote CAS. When given, those are forwarded to `just` as early arguments (so that later `just`-only ones can override them); moreover, when a file needed to set up a (present) root is found neither in local CAS nor in one of the specified distdirs, `just-mr` -will first ask the remote CAS for the missing file before trying to +first asks the remote CAS for the missing file before trying to fetch itself from the specified URL. The rationale for this search order is that the designated remote-execution service is typically reachable over the network in a more reliable way than external |