diff options
Diffstat (limited to 'doc/concepts/garbage.org')
-rw-r--r-- | doc/concepts/garbage.org | 82 |
1 files changed, 0 insertions, 82 deletions
diff --git a/doc/concepts/garbage.org b/doc/concepts/garbage.org deleted file mode 100644 index 26f6cc51..00000000 --- a/doc/concepts/garbage.org +++ /dev/null @@ -1,82 +0,0 @@ -* Garbage Collection - -For every build, for all non-failed actions an entry is created in -the action cache and the corresponding artifacts are stored in the -CAS. So, over time, a lot of files accumulate in the local build -root. Hence we have a way to reclaim disk space while keeping the -benefits of having a cache. This operation is referred to as garbage -collection and usually uses the heuristics to keeping what is most -recently used. Our approach follows this paradigm as well. - -** Invariants assumed by our build system - -Our tool assumes several invariants on the local build root, that we -need to maintain during garbage collection. Those are the following. -- If an artifact is referenced in any cache entry (action cache, - target-level cache), then the corresponding artifact is in CAS. -- If a tree is in CAS, then so are its immediate parts (and hence - also all transitive parts). - - -** Generations of cache and CAS - -In order to allow garbage collection while keeping the desired -invariants, we keep several (currently two) generations of cache -and CAS. Each generation in itself has to fulfill the invariants. -The effective cache or CAS is the union of the caches or CASes of -all generations, respectively. Obviously, then the effective cache -and CAS fulfill the invariants as well. - -The actual ~gc~ command rotates the generations: the oldest -generation is be removed and the remaining generations are moved -one number up (i.e., currently the young generation will simply -become the old generation), implicitly creating a new, empty, -youngest generation. As an empty generation fulfills the required -invariants, this operation preservers the requirement that each -generation individually fulfill the invariants. - -All additions are made to the youngest generation; in order to keep -the invariant, relevant entries only present in an older generation -are also added to the youngest generation first. Moreover, whenever -an entry is referenced in any way (cache hit, request for an entry -to be in CAS) and is only present in an older generation, it is -also added to the younger generation, again adding referenced -parts first. As a consequence, the youngest generation contains -everything directly or indirectly referenced since the last garbage -collection; in particular, everything referenced since the last -garbage collection will remain in the effective cache or CAS upon -the next garbage collection. - -These generations are stored as separate directories inside the -local build root. As the local build root is, starting from an -empty directory, entirely managed by `just` and compatible tools, -generations are on the same file system. Therefore the adding of -old entries to the youngest generation can be implemented in an -efficient way by using hard links. - -The moving up of generations can happen atomically by renaming the -respective directory. Also, the oldest generation can be removed -logically by renaming a directory to a name that is not searched -for when looking for existing generations. The actual recursive -removal from the file system can then happen in a separate step -without any requirements on order. - -** Parallel operations in the presence of garbage collection - -The addition to cache and CAS can continue to happen in parallel; -that certain values are taken from an older generation instead -of freshly computed does not make a difference for the youngest -generation (which is the only generation modified). But build -processes assume they don't violate the invariant if they first -add files to CAS and later a tree or cache entry referencing them. -This, however, only holds true if no generation rotation happens in -between. To avoid those kind of races, we make processes coordinate -over a single lock for each build root. -- Any build process keeps a shared lock for the entirety of the build. -- The garbage collection process takes an exclusive lock for the - period it does the directory renames. -We consider it acceptable that, in theory, local build processes -could starve local garbage collection. Moreover, it should be noted -that the actual removal of no-longer-needed files from the file -system happens without any lock being held. Hence the disturbance -of builds caused by garbage collection is small. |