diff options
Diffstat (limited to 'doc/concepts/target-cache.org')
-rw-r--r-- | doc/concepts/target-cache.org | 219 |
1 files changed, 0 insertions, 219 deletions
diff --git a/doc/concepts/target-cache.org b/doc/concepts/target-cache.org deleted file mode 100644 index 591a66af..00000000 --- a/doc/concepts/target-cache.org +++ /dev/null @@ -1,219 +0,0 @@ -* Target-level caching - -** ~git~ trees as content-fixed roots - -*** The ~"git tree"~ root scheme - -The multi-repository configuration supports a scheme ~"git tree"~. -This scheme is given by two parameters, -- the id of the tree (as a string with the hex encoding), and -- an arbitrary ~git~ repository containing the specified tree - object, as well as all needed tree and blob objects reachable - from that tree. -For example, a root could be specified as follows. -#+BEGIN_SRC -["git tree", "6a1820e78f61aee6b8f3677f150f4559b6ba77a4", "/usr/local/src/justbuild.git"] -#+END_SRC - -It should be noted that the ~git~ tree identifier alone already -specifies the content of the full tree. However, ~just~ needs access -to some repository containing the tree in order to know what the -tree looks like. - -Nevertheless, it is an important observation that the tree identifier -alone already specifies the content of the whole (logical) directory. -The equality of two such directories can be established by comparing -the two identifiers _without_ the need to read any file from -disk. Those "fixed-content" descriptions, i.e., descriptions of a -repository root that already fully determines the content are the -key to caching whole targets. - -*** ~KNOWN~ artifacts - -The in-memory representation of known artifacts has an optional -reference to a repository containing that artifact. Artifacts -"known" from local repositories might not be known to the CAS used -for the action execution; this additional reference allows to fill -such misses in the CAS. - -** Content-fixed repositories - -*** The parts of a content-fixed repository - -In order to meaningfully cache a target, we need to be able to -efficiently compute the cache key. We restrict this to the case where -we can compute the information about the repository without file-system -access. This requires that all roots (workspace, target root, etc) -be content fixed, as well as the bindings of the free repository -names (and hence also all transitively reachable repositories). -The call such repositories "content-fixed" repositories. - -*** Canonical description of a content-fixed repository - -The local data of a repository consists of the following. -- The roots (for workspace, targets, rules, expressions). As the - tree identifier already defines the content, we leave out the - path to the repository containing the tree. -- The names of the targets, rules, and expression files. -- The names of the outgoing "bindings". - -Additionally, repositories can reach additional repositories via -bindings. Moreover, this repository-level dependency relation -is not necessarily cycle free. In particular, we cannot use the -tree unfolding as canonical representation of that graph up to -bisimulation, as we do with most other data structures. To still get -a canonical representation, we factor out the largest bisimulation, -i.e., minimize the respective automaton (with repositories as -states, local data as locally observable properties, and the binding -relation as edges). - -Finally, for each repository individually, the reachable repositories -are renamed ~"0"~, ~"1"~, ~"2"~, etc, following a depth-first -traversal starting from the repository in question where outgoing -edges are traversed in lexicographical order. The entry point is -hence recognisable as repository ~"0"~. - -The repository key content-identifier of the canonically formatted -canonical serialisation of the JSON encoding of the obtain -multi-repository configuration (with repository-free git-root -descriptions). The serialisation itself is stored in CAS. - -These identifications and replacement of global names does not change -the semantics, as our name data types are completely opaque to our -expression language. In the ~"json_encode"~ expression, they're -serialized as ~null~ and string representation is only generated in -user messages not available to the language itself. Moreover, names -cannot be compared for equality either, so their only observable -properties, i.e., the way ~"DEP_ARTIFACTS"~, ~"DEP_RUNFILES~, and -~"DEP_PROVIDES"~ reacts to them are invariant under repository -bisimulation. - -** Configuration and the ~"export"~ rule - -Targets not only depend on the content of their repository, but also -on their configurations. Normally, -the effective part of a configuration is only determined after -analysing the target. However, for caching, we need to compute -the cache key directly. This property is provided by the built-in ~"export"~ rule; only ~"export"~ targets -residing in content-fixed repositories will be cached. This also -serves as indication, which targets of a repository are intended -for consumption by other repositories. - -An ~"export"~ rule takes precisely the following arguments. -- ~"target"~ specifying a single target, the target to be cached. - It must not be tainted. -- ~"flexible_config"~ a list of strings; those specify the variables - of the configuration that are considered. All other parts of - the configuration are ignored. So the effective configuration for - the ~"export"~ target is the configuration restricted to those - variables (filled up with ~null~ if the variable was not present - in the original configuration). -- ~"fixed_config"~ a dict with of arbitrary JSON values (taken - unevaluated) with keys disjoint from the ~"flexible_config"~. - -An ~"export"~ target is analyzed as follows. The configuration is -restricted to the variables specified in the ~"flexible_config"~; -this will result in the effective configuration for the exported -target. It is a requirement that the effective configuration contain -only pure JSON values. The (necessarily conflict-free) union with -the ~"fixed_config"~ is computed and the ~"target"~ is evaluated -in this configuration. The result (artifacts, runfiles, provided -information) is the result of that evaluation. It is a requirement -that the provided information does only contain pure JSON values -and artifacts (including tree artifacts); in particular, they may -not contain names. - -** Cache key - -We only consider ~"export"~ targets in content-fixed repositories -for caching. An export target is then fully described by -- the repository key of the repository the export target resides in, -- the target name of the export target within that repository, - described as module-name pair, and -- the effective configuration. -More precisely, the canonical description is the JSON object with -those values for the keys ~"repo_key"~, ~"target_name"~, and ~"effective_config"~, -respectively. The repository key is the blob identifier of the -canonical serialisation (including sorted keys, etc) of the just -described piece of JSON. To allow debugging and cooperation with -other tools, whenever a cache key is computed, it is ensured, -that the serialisation ends up in the applicable CAS. - -It should be noted that the cache key can be computed _without_ -analyzing the target referred to. This is possible, as the -configuration is pruned a priori instead of the usual procedure -to analyse and afterwards determine the parts of the configuration -that were relevant. - -** Cached value - -The value to be cached is the result of evaluating the target, -that is, its artifacts, runfiles, and provided data. All artifacts -inside those data structures will be described as known artifacts. - -As serialisation, we will essentially use our usual JSON encoding; -while this can be used as is for artifacts and runfiles where we -know that they have to be a map from strings to artifacts, additional -information will be added for the provided data. The provided data -can contain artifacts, but also legitimately pure JSON values that -coincide with our JSON encoding of artifacts; the same holds true -for nodes and result values. Moreover, the tree unfolding implicit -in the JSON serialisation can be exponentially larger than the value. - -Therefore, in our serialisation, we add an entry for every subexpression -and separately add a list of which subexpressions are artifacts, -nodes, or results. During deserialisation, we use this subexpression -structure to deserialize every subexpression only one. - -** Sharding of target cache - -In our target description, the execution environment is not included. -For local execution, it is implicit anyway. As we also want to -cache high-level targets when using remote execution, we shard the -target cache (e.g., by using appropriate subdirectories) by the blob -identifier of the serialisation of the description of the execution -backend. Here, ~null~ stands for local execution, and for remote -execution we use an object with keys ~"remote_execution_address"~ -and ~"remote_execution_properties"~ filled in the obvious way. As -usual, we add the serialisation to the CAS. - -** ~"export"~ targets, strictness and the extensional projection - -As opposed to the target that is exported, the corresponding export -target, if part of a content-fixed repository, will be strict: a -build depending on such a target can only succeed if all artifacts -in the result of target (regardless whether direct artifacts, -runfiles, or as part of the provided data) can be built, even if -not all (or even none) are actually used in the build. - -Upon cache hit, the artifacts of an export target are the known -artifacts corresponding to the artifacts of the exported target. -While extensionally equal, known artifacts are defined differently, -so an export target and the exported target are intensionally -different (and that difference might only be visible on the second -build). As intensional equality is used when testing for absence -of conflicts in staging, a target and its exported version almost -always conflict and hence should not be used together. One way to -achieve this is to always use the export target for any target that -is exported. This fits well together with the recommendation of -only depending on export targets of other repositories. - -If a target forwards artifacts of an exported target (indirect header -files, indirect link dependencies, etc), and is exported again, no -additional conflicts occur; replacing by the corresponding known -artifact is a projection: the known artifact corresponding to a -known artifact is the artifact itself. Moreover, by the strictness -property described earlier, if an export target has a cache hit, -then so have all export targets it depends upon. Keep in mind that -a repository can only be content-fixed if all its dependencies are. - -For this strictness-based approach to work, it is, however, a -requirement that any artifact that is exported (typically indirectly, -e.g., as part of a common dependency) by several targets is only -used through the same export target. For a well-structured repository, -this should not be a natural property anyway. - -The forwarding of artifacts are the reason we chose that in the -non-cached analysis of an export target the artifacts are passed on -as received and are not wrapped in an "add to cache" action. The -latter choice would violate that projection property we rely upon. |