summaryrefslogtreecommitdiff
path: root/doc/concepts
diff options
context:
space:
mode:
authorPaul Cristian Sarbu <paul.cristian.sarbu@huawei.com>2023-12-14 15:09:28 +0100
committerPaul Cristian Sarbu <paul.cristian.sarbu@huawei.com>2023-12-15 12:35:24 +0100
commit63492331fdeaff09ece27fb6d8719a7b40015393 (patch)
tree3e7a2ffc8bb97398f7eb9501332e57e909059111 /doc/concepts
parent05f79111fae9865184b1a2e636f5f3e7739faac6 (diff)
downloadjustbuild-63492331fdeaff09ece27fb6d8719a7b40015393.tar.gz
just serve design doc: Update and move to concepts
Diffstat (limited to 'doc/concepts')
-rw-r--r--doc/concepts/service-target-cache.md334
1 files changed, 334 insertions, 0 deletions
diff --git a/doc/concepts/service-target-cache.md b/doc/concepts/service-target-cache.md
new file mode 100644
index 00000000..6a64baa1
--- /dev/null
+++ b/doc/concepts/service-target-cache.md
@@ -0,0 +1,334 @@
+Target-level caching as a service
+=================================
+
+Motivation
+----------
+
+Projects can have quite a lot of dependencies that are not part of the
+build environment, but are, instead, built from source, e.g., in order
+to always build against the latest snapshot. The latter is a typical
+workflow in case of first-party dependencies. In the case of
+`justbuild`, those first-party dependencies form a separate logical
+repository that is typically content fixed (e.g., because that
+dependency is versioned in a `git` repository).
+
+Moreover, code is typically first built (and tested) by the owning
+project before being used as a dependency. Therefore, if remote
+execution is used, for a first-party dependency, we expect all actions
+to be in cache. As dependencies are typically updated less often than
+the code being developed is changed, in most builds, the dependencies
+are in target-level cache. In other words, in a remote-execution setup,
+the whole code of dependencies is fetched just to walk through the
+action graph a single time to get the necessary cache hits.
+
+Core concepts and implementation
+--------------------------------
+
+To avoid these unnecessary fetches, we have added a new subcommand
+`just serve` that start a main service that provides the dependencies.
+This typically happens by looking up a target-level cache entry.
+If the entry, however, is not in cache, this also includes building the
+respective `export` target using an associated remote-execution endpoint.
+
+### Scope: eligible `export` targets
+
+In order to typically have requests in cache, `just serve` will refuse
+to handle requests that do not refer to `export` targets in
+content-fixed repositories; recall that for a repository to be content
+fixed, so have to be all repositories reachable from there.
+
+### Communication through an associated remote-execution service
+
+Each `just serve` endpoint is always associated with a remote-execution
+endpoint. All artifacts exchanged between client and `just serve`
+endpoint are exchanged via the CAS that is part in the associated
+remote-execution endpoint. This remote-execution endpoint is also used
+if `just serve` has to build targets.
+
+The associated remote-execution endpoint can well be the same process
+simultaneously acting as `just execute`. In fact, this is the default if
+no remote-execution endpoint is specified.
+
+### Sources: local git repositories and remote trees
+
+A `just serve` instance takes roots from various sources,
+
+ - the `git` repository contained in the local build root,
+ - additional `git` repositories, optionally specified in the
+ invocation, and
+ - as last resort, asking the CAS in the designated remote-execution
+ service for the specified `git` tree.
+
+Allowing a list of repositories to take as sources (rather than a single
+one) increases the effort when having to search for a specified tree (e.g.,
+in case the requested `export` target is not in cache and an actual
+analysis of the build has to be carried out) or specific commit (e.g., in
+case a client asks for the tree of a given commit). However, it allows for
+the natural workflow of keeping separate upstream repositories in
+separate clones (updated in an appropriate way) without artificially
+putting them in a single repository (as orphan branches).
+
+Supporting building against trees from CAS allows more flexibility in
+defining roots that clients do not have to care about. In fact, they can
+be defined in any way, as long as
+
+ - the client is aware of the git tree identifier of the root, and
+ - some entity ensures the needed trees are known to the CAS.
+
+The auxiliary changes to `just-mr` described later in this document
+provide one possible way to handle archives in this way. Moreover, this
+additional flexibility will be necessary if we ever support computed
+roots, i.e., roots that are the output of a `just` build.
+
+### Delegation: absent roots in `just` repository specification
+
+In order for `just` to know for which repositories to delegate the build
+to the designated `just serve` endpoint, the repository configuration
+for `just` can mark roots as _absent_; this is done by only giving the
+type as `"git tree"` (or the corresponding ignore-special variant
+thereof) and the tree identifier in the root specification, but no
+witnessing repository.
+
+Any repository containing an absent root has to be content fixed, but
+not all roots have to be absent (as `just` can always upload those trees
+to CAS). It is an error if, outside the computations delegated to
+`just serve`, a non-export target is requested from a repository
+containing an absent root. Moreover, whenever there is a dependency on a
+repository containing an absent root, a `just serve` endpoint has to be
+specified in the invocation of `just`.
+
+Protocol description
+--------------------
+
+Communication is handled via `grpc` exchanging `proto` buffers
+containing the information described in the rest of this section.
+
+Besides the main service of `just serve`, auxiliary requests are defined,
+bundled in two other services: one allowing `just-mr` to configure
+multi-repository builds in the context of `absent` roots, and the other
+to perform the optional check for remote-execution endpoint consistency
+between a client and the `just serve` endpoint.
+
+### Main service
+
+#### Main request and answer format
+
+A request is given by
+
+ - the map of remote-execution properties for the designated
+ remote-execution endpoint,
+ - the identifier of the blob containing the endpoint configuration
+ information; together with the knowledge on the fixed endpoint,
+ the `just serve` instance computes the target-level cache shard,
+ and
+ - the identifier of the target-level cache key; it is the
+ client's responsibility to ensure that the referred blob (i.e.,
+ the JSON object with appropriate values for the keys
+ `"repo_key"`, `"target_name"`, and `"effective_config"`) as well
+ as the indirectly referred repository description (the JSON
+ object the `"repo_key"` in the cache key refers to) are uploaded
+ to CAS (of the designated remote-execution endpoint) beforehand.
+
+The answer to that request is the identifier of the corresponding
+target-level cache value (in the same format as for local
+target-level caching). The `just serve` instance ensures that
+the actual value, as well as any directly or indirectly referenced
+artifacts are available in the respective remote-execution CAS.
+Alternatively, the answer indicates the kind of error (unknown
+root, not an export target, build failure, etc).
+
+#### Auxiliary request: flexible variables of an `export` target
+
+To allow `just` to compute the target-level cache key without
+knowledge of an absent tree, `just serve` also answers questions
+about the flexible variables of an `export` target. Such an `export`
+target is specified by the tree of its target-level root, the name
+of the targets file, and the name of the target itself. The answer
+is a list of strings, naming the flexible variables.
+
+#### Auxiliary request: rule description of an `export` target
+
+To support `just describe` also in the cases where code is delegated
+to the `just serve` endpoint, an additional request for the
+`describe` information of a target can be requested; as `just serve`
+only handles `export` targets, this target necessarily has to be an
+export target.
+
+The request again contains the tree identifier of the target-level
+root, the name of the targets file, and the name of the target to
+inspect. The answer is the identifier of a blob containing a JSON object
+with the needed information, i.e., those parts of the target description
+that are used by `just describe`. Alternatively, the answer indicates
+the kind of error (unknown root, not an export target, etc).
+
+### Auxiliary service: source trees
+
+#### Auxiliary request: tree of a commit
+
+For `git` repositories it is common to specify a commit in order
+to fix a dependency (even though the corresponding tree identifier
+would be enough). Moreover, the standard `git` protocol supports
+asking for the commit of a given remote branch, but additional
+overhead is needed in order to get the tree identifier.
+
+Therefore, in order to support clients (or, more precisely,
+`just-mr` instances setting up the repository description) in
+constructing an appropriate request for `just serve` without
+unnecessary overhead, `just serve` supports a second kind of
+request, where the client request consists of a `git` commit
+identifier and the server answers with the tree identifier for that
+commit if it is aware of that commit, or indicates that it is not
+aware of that commit.
+
+Optionally, the client can request that `just serve` back up this
+tree in the CAS of the associated remote-execution endpoint.
+
+#### Auxiliary request: tree of an archive
+
+For archives typically the `git` blob identifier is given, rather
+than the tree. In order to allow `just-mr` to set up a repository
+description without fetching the respective archive, `just serve`
+supports also a request which, given the blob identifier of an
+archive, answers with the respective tree identifier of the unpacked
+archive. Here, if `just serve` needs the archive, it can look it
+up in its CAS, any of the supplied `git` repositories (where one
+might be for archiving of the third-party distribution archives),
+and the specified remote-execution endpoint.
+
+The (functional!) association of archive blob identifier to tree
+identifier of the unpacked archive is stored in the local build
+root and the respective tree is fixed in the `git` repository of
+the local build root in the same way as `just-mr` does it. When
+answering such a request, that tree map is consulted first (so that
+those requests as well can be typically served from cache).
+
+Optionally, the client can request that `just serve` back up this
+tree in the CAS of the associated remote-execution endpoint.
+
+#### Auxiliary requests: known Git objects
+
+For `just fetch` operations typically either a blob (e.g., content of
+an archive) or a tree (e.g., a root, like from a `git tree` repository)
+are needed to be stored into local CAS. For these cases, two auxiliary
+requests, one for blobs and one for trees, respectively, have been
+provided. They check whether the `just serve` endpoint knows these Git
+objects and, if yes, ensure they are uploaded to the remote CAS, from
+where the client can easily then retrieve them.
+
+### Auxiliary service: configuration
+
+#### Auxiliary request: remote-execution endpoint
+
+Given that all artifact exchanges between client and `just serve`
+rely on the CAS of a given remote endpoint, the client might want
+to double check that the remote execution endpoint it wants to use
+is the same that is associated with the `just serve` instance.
+
+The server replies with the address (in the usual `HOST:PORT` string
+format) of the associated remote execution endpoint, if set, or an
+empty string otherwise (i.e., if the serve endpoint acts also as
+execution endpoint).
+
+Auxiliary changes
+-----------------
+
+### Modifications to the justbuild analysis of an export target
+
+During the analysis of an export target, querying the `just serve` endpoint
+is exclusively linked to the presence of at least one _absent_ root.
+
+The first time that we need to query `just serve` we verify that its remote
+endpoint coincides with the one given to `just`.
+
+If the _target root_ for this export target is marked as absent:
+ - We query the `just serve` for retrieving the flexible configuration
+ variables needed to compute the target cache key. If `just serve` cannot
+ answer, we break the analysis and inform the user with a proper error
+ message.
+
+ - With the served flexible configuration variables we compute the target
+ cache key, as all other required information for this in available
+ locally. If the cache entry is not in the local target cache, we query
+ `just serve` to provide the associated target cache value. If it is not
+ able to provide the target cache value, analysis fails and we error out.
+
+It has to be noted that, in the case the `just serve` endpoint also does
+not have the target cache entry in its own target cache, a build of the
+content-fixed target is dispatched to the associated remote-execution
+endpoint, which will thus increase the time spent in the analysis phase,
+as experienced by the user. In order to provide a better user experience,
+the work done by the `just serve` endpoint is also being reported to the
+end user, similarly to the reporting done for a locally-triggered build.
+
+#### `just-mr` pragma `"absent"`
+
+For `just-mr` to know how to construct the multi-repository description,
+the description used by `just-mr` was extended. More precisely, a new
+key `"absent"` is allowed in the `"pragma"` dictionary of a
+repository description. If the specified value is true, `just-mr`
+generates an absent root out of this description, using all
+available means to generate that root without ever having to fetch
+the repository locally. For example, in the typical case of a `git`
+repository the auxiliary `just serve` function to obtain the tree of a
+commit is used. To allow this communication, `just-mr` also accepts
+arguments describing a `just serve` endpoint and forwards them as
+early arguments to `just`, in the same way as it does, e.g., with
+`--local-build-root`.
+
+#### `just-mr` to inquire remote execution before fetching
+
+In line with the idea that fetching sources from upstream should
+happen only once and not once per developer, we have added remote
+execution as another way of obtaining files to `just-mr`. More precisely,
+`just-mr` now supports the options `just` accepts to connect to the
+remote CAS. When given, those are forwarded to `just` as early
+arguments (so that later `just`-only ones can override them);
+moreover, when a file needed to set up a (present) root is found
+neither in local CAS nor in one of the specified distdirs, `just-mr`
+first asks the remote CAS for the missing file before trying to
+fetch itself from the specified URL. The rationale for this search
+order is that the designated remote-execution service is typically
+reachable over the network in a more reliable way than external
+resources (while local resources do not require a network at all).
+
+#### `just-mr` to support new repository type `git tree`
+
+A new repository type is added to `just-mr`, called `git tree`. Such
+a repository is given by
+
+ - a `git` tree identifier, and
+ - a command that, when executed in an empty directory (anywhere in
+ the file system) will create in that directory a directory
+ structure containing the specified `git` tree (either top-level
+ or in some subdirectory). Moreover, that command does not modify
+ anything outside the directory it is called in; it is an error
+ if the specified tree is not created in this way.
+
+In this way, content-fixed repositories can be generated in a
+generic way, e.g., using other version-control systems or
+specialized artifact-fetching tools.
+
+Additionally, for archive-like repositories in the `just-mr`
+repository specification (currently `archive` and `zip`), a `git`
+tree identifier can be specified. If the tree is known to `just-mr`,
+or the `"pragma"` `"absent"` is given, it will just use that tree.
+Otherwise, it will fetch as usual, but error out if the obtained
+tree is not the promised one after unpacking and taking the
+specified subdirectory. In this way, also archives can be used as
+absent roots.
+
+#### `just-mr fetch` to support storing in remote-execution CAS
+
+The `fetch` subcommand of `just-mr` will get an additional option to
+support backing up the fetched information not to a local directory,
+but instead to the CAS of the specified remote-execution endpoint.
+This includes
+
+ - all archives fetched, but also
+ - all trees computed in setting up the respective repository
+ description, both, from `git tree` repositories, as well as from
+ archives.
+
+In this way, `just-mr` can be used to fill the CAS from one central
+point with all the information the clients need to treat all
+content-fixed roots as absent.