summaryrefslogtreecommitdiff
path: root/doc/future-designs/tc-gc.md
blob: afaf4bd358dc3dce6829f0284df47e0e4bdcfeef (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
# Target-level cache dependencies for garbage collection

## Background

### Target-level caching

In order to keep analysis manageable, `just` cuts out unchanged
parts of the target graph by means of target-level caching. More
precisely, `export` targets of transitively content-fixed repositories
are cached; if such a target is requested a second time, the cached
value is used without even analysing the target graph defining
this target.

Implicit in that caching is a projection in analysis of this
target from its intensional description to its extensional one.
This change in definition requires the build to be organized in a
way that the intensional and extensional definition are not used
together. Typically, this is achieved by ensuring that whenever an
artifact is used that goes through this `export` target, then so
do all uses; as the change to the extensional name is a projection,
the strictness of the evaluation of `export` targets together with
the fact that (only) on successful build _all_ analysed `export`
targets are cached ensures the absence of conflicts.

#### Example

Consider the following target file (on a content-fixed root) as
example.

```
{ "generated":
  {"type": "generic", "outs": ["out.txt"], "cmds": ["echo Hello > out.txt"]}
, "export": {"type": "export", "target": "generated"}
, "use":
  {"type": "install", "dirs": [["generated", "."], ["generated", "other-use"]]}
, "": {"type": "export", "target": "use"}
}
```

Upon initial analysis (on an empty local build root) of the default
target `""`, the output artifact `out.txt` is an action artifact, more
precisely the same one that is output of the target `"generated"`;
the target `"export"` also has the same artifact on output. After
building the default target, a target-cache entry will be written
for this target, containing the extensional definition of the target,
so for `out.txt` the known artifact `e965047ad7c57865...` stored; as
a side effect, also for the target `"export"` a target-cache entry
will be written, containing, of course, the same known artifact.
So on subsequent analysis, both `"export"` and `""` will still
have the same artifact for `out.txt`, but this time a known one.
This artifact is now different from the artifact of the target
`"generated"` (which is still an action artifact), but no conflicts
arise as the usual target discipline requires that any target not
a (direct or indirect) dependency of `"export"` use the target
`"generated"` only indirectly by using the target `"export"`.

Also note that further exporting such a target has to effect, as a
known artifact always evaluates to itself. In that sense, replacing
by the extensional definition is a projection.

### Gargabe collection

In order to reclaim disk space used by the cache directory, `just`
has an option to carry out garbage collection. More precisely, the
cache is organized in two generations, and a garbage-collection
step removes the old generation and renames the young generation
to be the old one. All operations are carried out from the young
generation; entries found in the old generation are linked to the
young generation before being used. While doing so, the following
invariants are kept by uplinking, in the correct order, more entries.

 - If an artifact is referenced in any cache entry (action cache,
   target-level cache), then the corresponding artifact is in CAS.
 - If a tree is in CAS, then so are its immediate parts (and hence
   also all transitive parts).

## Current situation and shortcomings

As it is implemented currently, garbage collection does not honor
the invariant on export targets that if one export target is in
cache, the ones traversed during the analysis of that particular
target are in cache as well. In fact, those implied targets tend to
be garbage collected, as typical builds only reference the top-level
export targets.

This can lead to a staging conflict, e.g., in the situation where
two `export` targets that contain artifacts from a common `export`
target are used together. Now, if that common `export` target
goes out of cache and for one of the two top-level targets the
description changes, that target will use, due to the cache loss, the
intensional definition of the artifact from the common target with
the other (still cached) target still using the extensional one.

## Proposed solution

We propose to honor the dependency on export targets in garbage
collection by appropriately uplinking the implied target-level cache
entries as well. To do so, the dependency of configured `export`
targets on others will be stored in the corresponding cache value.

### Analysis to track the export targets depended upon

So far, the dependency of export targets on one another was only
tracked implicitly by the evaluation model of target definitions.
As we now have to persist this dependency, we need to explicitly
track it. More precisely, the internal data structure of an analyzed
target will be extended by a set of all the export targets eligible
for caching, represented by their `TargetCacheKey`, encountered
during the analysis of that target.

### Extension of the value of a target-level cache entry

The cached value for a target-level cache entry is serialized as a
JSON object, with currently the keys `"artifacts"`, `"runfiles"`, and
`"provides"`. This object will be extended by an additional (optional)
key `"implied export targets"` that lists (in lexicographic order)
the hashes of the cache keys of the export targets the analysis of
the given export target depends upon; the field is only serialized
if that list is non empty.

### Additional invariant honored during uplinking

Our cache will honor the additional invariant that, whenever a
target-level cache entry is present, so are the implied target-level
cache entries. This invariant will be honored when adding new
target-level cache entries by adding them in the correct order, as
well as when uplinking by uplinking the implied entries first (and
there, of course, honoring the respective invariants).

### Interaction with `just serve`

When building, `just` normally does not create an entry for
target-level cache hit received from `just serve`. However, it
might happen that `just` has to analyse an eligible `export`
target locally, as the `just serve` instance cannot provide it, and
during that analysis `export` targets provided by `just serve` are
encountered. In this case, before writing the target-level cache
entry for the locally analysed `export` target, the `just build`
process, in order to keep cache consistency, will first query and
write to the local target-level cache the transitively implied
target-level cache entries.