diff options
author | Klaus Aehlig <klaus.aehlig@huawei.com> | 2023-12-14 12:38:47 +0100 |
---|---|---|
committer | Klaus Aehlig <klaus.aehlig@huawei.com> | 2023-12-18 12:16:56 +0100 |
commit | 548437e591080737711547db62ffea8c28b2a025 (patch) | |
tree | cc0eceb8a6d70cf5904274863b28347e193865d6 /doc | |
parent | 2dfebd24ad43c98291c9c82121adbc7e6717713d (diff) | |
download | justbuild-548437e591080737711547db62ffea8c28b2a025.tar.gz |
Add design document for adding garbage collection for build roots
Diffstat (limited to 'doc')
-rw-r--r-- | doc/future-designs/git-gc.md | 100 |
1 files changed, 100 insertions, 0 deletions
diff --git a/doc/future-designs/git-gc.md b/doc/future-designs/git-gc.md new file mode 100644 index 00000000..a46699db --- /dev/null +++ b/doc/future-designs/git-gc.md @@ -0,0 +1,100 @@ +# Garbage Collection for Repository Roots + +## Current status and shortcomings + +The multi-repository tool `just-mr` often has to create roots: the +tree for an archive, an explicit `"git tree"` root, etc. All those +roots are stored in a `git` repository in the local build root. +They are fixed by a tagged commit to be persistent there. In this +way, the roots are available long-term and the association between +archive hash and resulting root can be persisted. The actual archive +can eventually be garbage collected as the root can efficiently be +provided by that cached association. + +While this setup is good at preserving roots in a quite compact +form, there currently is no mechanism to get rid of roots that are +no longer needed. Especially switching between projects that have +a large number of third-party dependencies, or on projects changing +their set of dependencies frequently, this `git` repository in the +local build root can grow large. + +## Proposed changes + +We propose to add generational garbage collection for the repository +roots, similar to the one we have for the main CAS, action cache, +and target-level cache. + +### Change of layout in the local build root + +For the repository roots, there are consistency conditions in the local +build root: an entry in any of the map directories (`tree-map/archive`, +`tree-map/zip`, `distfile-tree-map`, etc) promises that the referenced +tree is available in the local `git` repository. + +So, in order to allow atomic generation rotation, and hence to +reliably preserve the internal consistency, the `git` repository +and the map directories are put into generation directories. More +precisely, for the youngest generation, the respective directories +reside in `roots/generation-0` and for the older generation in +`roots/generation-1`. + +### Use of old generations when setting up roots + +When `just-mr` looks into one of the generation-controlled resources, +it does so in the youngest generation. If found there, it will +proceed as done currently. If not found, it will immediately look +into the corresponding resource in the older generation; if available +in the older generation it will promote the found object in the +following way. + + - To promote a `git` commit, a fetch from the old-generation + repository to the new-generation repository will be carried + out (using `libgit`'s functionality). As a fetch from a repository + on the same file system is backed by hard links, no significant + storage overhead will occur. The promoted commit will be tagged + in the new-generation repository to ensure it stays there + persistently. As usual, the tag name is encoding the commit id, + so that no conflicts occcur. + + - To promote a `git` tree, a commit is created with this tree as + tree, a commit message that is a function of the tree id, and + no parents. That commit is promoted. + + - To promote an entry in one of the maps, first the corresponding + tree will be promoted, then the entry itself will be promoted + by creating a hard link. + +### New command `just-mr gc-repo` + +The multi-repository tool `just-mr` will get a new subcommand `gc-repo`, +a name chosen to not conflict with the laucher functionality; recall +that `just-mr gc` will simply call `just gc`. This new `just-rm +gc-repo` command rotates the generations: the old generation will +be removed and the new one will become (atomically by a rename) +the old one. + +### Locking + +To avoid interference between setting up the various repository +roots needed for one multi-repository build and the repository +garbage collection, we use an `flock`-based locking, similar as we +do for the main CAS. There is a repository-gc lock. + + - Any invocation of `just-mr` apart from `just-mr gc-repo` takes a + shared lock and keeps it over its whole lifetime. When invoked as + a launcher, the lock is kept over the `exec` so that the launched + process can rely on the roots not being garbage collected. + + - An invocation of `just-mr gc-repo` takes an exclusive lock for + the period it does the directory renames. + +## Considerations on the transition + +As there is no overlap between the old and the new locations of +the root-related directories, correctness is not affected by this +transition. However, the `git` repository and the map directories +in the old location will become unused and therefore pointlessly +waste disk space. The upgrade notes in our changelog will therefore +recommend to either manually create the generation directories +and move the `git` repository and the map directories there, or to +alternatively remove them. |