summaryrefslogtreecommitdiff
path: root/doc/concepts/target-cache.org
diff options
context:
space:
mode:
Diffstat (limited to 'doc/concepts/target-cache.org')
-rw-r--r--doc/concepts/target-cache.org219
1 files changed, 0 insertions, 219 deletions
diff --git a/doc/concepts/target-cache.org b/doc/concepts/target-cache.org
deleted file mode 100644
index 591a66af..00000000
--- a/doc/concepts/target-cache.org
+++ /dev/null
@@ -1,219 +0,0 @@
-* Target-level caching
-
-** ~git~ trees as content-fixed roots
-
-*** The ~"git tree"~ root scheme
-
-The multi-repository configuration supports a scheme ~"git tree"~.
-This scheme is given by two parameters,
-- the id of the tree (as a string with the hex encoding), and
-- an arbitrary ~git~ repository containing the specified tree
- object, as well as all needed tree and blob objects reachable
- from that tree.
-For example, a root could be specified as follows.
-#+BEGIN_SRC
-["git tree", "6a1820e78f61aee6b8f3677f150f4559b6ba77a4", "/usr/local/src/justbuild.git"]
-#+END_SRC
-
-It should be noted that the ~git~ tree identifier alone already
-specifies the content of the full tree. However, ~just~ needs access
-to some repository containing the tree in order to know what the
-tree looks like.
-
-Nevertheless, it is an important observation that the tree identifier
-alone already specifies the content of the whole (logical) directory.
-The equality of two such directories can be established by comparing
-the two identifiers _without_ the need to read any file from
-disk. Those "fixed-content" descriptions, i.e., descriptions of a
-repository root that already fully determines the content are the
-key to caching whole targets.
-
-*** ~KNOWN~ artifacts
-
-The in-memory representation of known artifacts has an optional
-reference to a repository containing that artifact. Artifacts
-"known" from local repositories might not be known to the CAS used
-for the action execution; this additional reference allows to fill
-such misses in the CAS.
-
-** Content-fixed repositories
-
-*** The parts of a content-fixed repository
-
-In order to meaningfully cache a target, we need to be able to
-efficiently compute the cache key. We restrict this to the case where
-we can compute the information about the repository without file-system
-access. This requires that all roots (workspace, target root, etc)
-be content fixed, as well as the bindings of the free repository
-names (and hence also all transitively reachable repositories).
-The call such repositories "content-fixed" repositories.
-
-*** Canonical description of a content-fixed repository
-
-The local data of a repository consists of the following.
-- The roots (for workspace, targets, rules, expressions). As the
- tree identifier already defines the content, we leave out the
- path to the repository containing the tree.
-- The names of the targets, rules, and expression files.
-- The names of the outgoing "bindings".
-
-Additionally, repositories can reach additional repositories via
-bindings. Moreover, this repository-level dependency relation
-is not necessarily cycle free. In particular, we cannot use the
-tree unfolding as canonical representation of that graph up to
-bisimulation, as we do with most other data structures. To still get
-a canonical representation, we factor out the largest bisimulation,
-i.e., minimize the respective automaton (with repositories as
-states, local data as locally observable properties, and the binding
-relation as edges).
-
-Finally, for each repository individually, the reachable repositories
-are renamed ~"0"~, ~"1"~, ~"2"~, etc, following a depth-first
-traversal starting from the repository in question where outgoing
-edges are traversed in lexicographical order. The entry point is
-hence recognisable as repository ~"0"~.
-
-The repository key content-identifier of the canonically formatted
-canonical serialisation of the JSON encoding of the obtain
-multi-repository configuration (with repository-free git-root
-descriptions). The serialisation itself is stored in CAS.
-
-These identifications and replacement of global names does not change
-the semantics, as our name data types are completely opaque to our
-expression language. In the ~"json_encode"~ expression, they're
-serialized as ~null~ and string representation is only generated in
-user messages not available to the language itself. Moreover, names
-cannot be compared for equality either, so their only observable
-properties, i.e., the way ~"DEP_ARTIFACTS"~, ~"DEP_RUNFILES~, and
-~"DEP_PROVIDES"~ reacts to them are invariant under repository
-bisimulation.
-
-** Configuration and the ~"export"~ rule
-
-Targets not only depend on the content of their repository, but also
-on their configurations. Normally,
-the effective part of a configuration is only determined after
-analysing the target. However, for caching, we need to compute
-the cache key directly. This property is provided by the built-in ~"export"~ rule; only ~"export"~ targets
-residing in content-fixed repositories will be cached. This also
-serves as indication, which targets of a repository are intended
-for consumption by other repositories.
-
-An ~"export"~ rule takes precisely the following arguments.
-- ~"target"~ specifying a single target, the target to be cached.
- It must not be tainted.
-- ~"flexible_config"~ a list of strings; those specify the variables
- of the configuration that are considered. All other parts of
- the configuration are ignored. So the effective configuration for
- the ~"export"~ target is the configuration restricted to those
- variables (filled up with ~null~ if the variable was not present
- in the original configuration).
-- ~"fixed_config"~ a dict with of arbitrary JSON values (taken
- unevaluated) with keys disjoint from the ~"flexible_config"~.
-
-An ~"export"~ target is analyzed as follows. The configuration is
-restricted to the variables specified in the ~"flexible_config"~;
-this will result in the effective configuration for the exported
-target. It is a requirement that the effective configuration contain
-only pure JSON values. The (necessarily conflict-free) union with
-the ~"fixed_config"~ is computed and the ~"target"~ is evaluated
-in this configuration. The result (artifacts, runfiles, provided
-information) is the result of that evaluation. It is a requirement
-that the provided information does only contain pure JSON values
-and artifacts (including tree artifacts); in particular, they may
-not contain names.
-
-** Cache key
-
-We only consider ~"export"~ targets in content-fixed repositories
-for caching. An export target is then fully described by
-- the repository key of the repository the export target resides in,
-- the target name of the export target within that repository,
- described as module-name pair, and
-- the effective configuration.
-More precisely, the canonical description is the JSON object with
-those values for the keys ~"repo_key"~, ~"target_name"~, and ~"effective_config"~,
-respectively. The repository key is the blob identifier of the
-canonical serialisation (including sorted keys, etc) of the just
-described piece of JSON. To allow debugging and cooperation with
-other tools, whenever a cache key is computed, it is ensured,
-that the serialisation ends up in the applicable CAS.
-
-It should be noted that the cache key can be computed _without_
-analyzing the target referred to. This is possible, as the
-configuration is pruned a priori instead of the usual procedure
-to analyse and afterwards determine the parts of the configuration
-that were relevant.
-
-** Cached value
-
-The value to be cached is the result of evaluating the target,
-that is, its artifacts, runfiles, and provided data. All artifacts
-inside those data structures will be described as known artifacts.
-
-As serialisation, we will essentially use our usual JSON encoding;
-while this can be used as is for artifacts and runfiles where we
-know that they have to be a map from strings to artifacts, additional
-information will be added for the provided data. The provided data
-can contain artifacts, but also legitimately pure JSON values that
-coincide with our JSON encoding of artifacts; the same holds true
-for nodes and result values. Moreover, the tree unfolding implicit
-in the JSON serialisation can be exponentially larger than the value.
-
-Therefore, in our serialisation, we add an entry for every subexpression
-and separately add a list of which subexpressions are artifacts,
-nodes, or results. During deserialisation, we use this subexpression
-structure to deserialize every subexpression only one.
-
-** Sharding of target cache
-
-In our target description, the execution environment is not included.
-For local execution, it is implicit anyway. As we also want to
-cache high-level targets when using remote execution, we shard the
-target cache (e.g., by using appropriate subdirectories) by the blob
-identifier of the serialisation of the description of the execution
-backend. Here, ~null~ stands for local execution, and for remote
-execution we use an object with keys ~"remote_execution_address"~
-and ~"remote_execution_properties"~ filled in the obvious way. As
-usual, we add the serialisation to the CAS.
-
-** ~"export"~ targets, strictness and the extensional projection
-
-As opposed to the target that is exported, the corresponding export
-target, if part of a content-fixed repository, will be strict: a
-build depending on such a target can only succeed if all artifacts
-in the result of target (regardless whether direct artifacts,
-runfiles, or as part of the provided data) can be built, even if
-not all (or even none) are actually used in the build.
-
-Upon cache hit, the artifacts of an export target are the known
-artifacts corresponding to the artifacts of the exported target.
-While extensionally equal, known artifacts are defined differently,
-so an export target and the exported target are intensionally
-different (and that difference might only be visible on the second
-build). As intensional equality is used when testing for absence
-of conflicts in staging, a target and its exported version almost
-always conflict and hence should not be used together. One way to
-achieve this is to always use the export target for any target that
-is exported. This fits well together with the recommendation of
-only depending on export targets of other repositories.
-
-If a target forwards artifacts of an exported target (indirect header
-files, indirect link dependencies, etc), and is exported again, no
-additional conflicts occur; replacing by the corresponding known
-artifact is a projection: the known artifact corresponding to a
-known artifact is the artifact itself. Moreover, by the strictness
-property described earlier, if an export target has a cache hit,
-then so have all export targets it depends upon. Keep in mind that
-a repository can only be content-fixed if all its dependencies are.
-
-For this strictness-based approach to work, it is, however, a
-requirement that any artifact that is exported (typically indirectly,
-e.g., as part of a common dependency) by several targets is only
-used through the same export target. For a well-structured repository,
-this should not be a natural property anyway.
-
-The forwarding of artifacts are the reason we chose that in the
-non-cached analysis of an export target the artifacts are passed on
-as received and are not wrapped in an "add to cache" action. The
-latter choice would violate that projection property we rely upon.