diff options
Diffstat (limited to 'doc/future-designs/cas-objects-import.md')
-rw-r--r-- | doc/future-designs/cas-objects-import.md | 96 |
1 files changed, 96 insertions, 0 deletions
diff --git a/doc/future-designs/cas-objects-import.md b/doc/future-designs/cas-objects-import.md new file mode 100644 index 00000000..1fc061f8 --- /dev/null +++ b/doc/future-designs/cas-objects-import.md @@ -0,0 +1,96 @@ +Importing objects to CAS +======================== + +Motivation +---------- + +Roots in `just` builds are typically Git trees. Such artifacts should ideally +be available already in CAS to avoid any expensive fetches. It is not uncommon +for checkouts or other pre-fetched data to be present locally and to want to +quickly integrate them into a multi-repository build. + +A typical approach is to pack the process that generates the root directory +and all its content into a script (or a sequence of commands) and place it as +the fetch command of a `git tree` repository inside the multi-repository +configuration file provided to `just-mr`. This however comes with two +inefficiencies: firstly, the user has to specify a priori the expected Git +tree identifier of the root, and secondly, if the tree is not already cached +then `just-mr` will have to (re)run the (usually expensive) fetch command and +hash the generated directory in order to store the root in its Git cache. + +Proposal +-------- + +We propose a new `just add-to-cas` subcommand which takes in a filesystem +path and provides its Git hash to standard output. A corresponding +entry is also added to the local (file, executable, or tree) CAS. While +hashing directories into Git trees is the most important use-case, the +subcommand will allow the hashing of blobs as well. + +Note that `just-mr` by default looks both into the local CAS and the Git cache +for any blobs or trees it might need to make available. In particular, build +roots with trees already in local CAS would only need to be additionally +imported into the Git cache, which is a cheap local operation. + +### CAS locations + +`just` provides a default local build root location for where the local CAS +should be stored. However, a useful use-case of this new subcommand would be +to populate a specific (and possibly existing) CAS. Therefore an option to +specify the local build root will be available. + +### Remote CAS + +Similarly, we want to be able to populate a remote CAS as well, e.g., to +prepare roots for a future build. For this reason, the subcommand will allow +to specify a remote-execution endpoint, with the understanding that the +generated local CAS entry should be synced with the remote. Additionally, +we will support computing hashes also in compatible mode, set via an +appropriate option. + +### Symbolic links + +If the given path points to a symbolic link, by default the link will be +followed and we will hash the object the symlink resolves to. However, we will +also support, via an appropriate option, to hash the symlink content as-is +instead. + +If the given path points to a directory, the treatment of contained symbolic +links will default to allowing only non-upwards symlinks. To mirror options +available in `just-mr` repository descriptions, we will also support options +to either ignore, or fully resolve symlinks in the generated Git trees. + +### Other notes + +The only mandatory argument is the path to the filesystem object to be hashed, +with all other options optional, as sensible defaults exist already in `just`. + +Logging options will also be available for this subcommand. + +While the main purpose of this subcommand is to add the hashed Git object to +CAS, and considering that hashing a directory is a non-trivial operation for +the command-line `git` tool, a pure computation of the hashes _without_ +generating a CAS entry might still be of interest and available as an option. +This is, however, not useful in typical situations. + +Auxiliary changes +----------------- + +### `just-mr` to support `archive` subcommand + +Build sources many times come in the form of archives, which allow efficient +storage and backup, and permit easy versioning of build dependencies. Also +build artifacts, for example resulting binaries (together with required headers +when building statically), are typically shipped as archives. + +The `just-mr` tool will therefore implement a new subcommand `archive` which, +given a tree identifier, with the understanding that the tree is available +locally (either in local CAS or the Git cache), will produce an archive from +the content of that tree. + +The archive content will be written to standard output, thus allowing the usual +piping and redirection of binary streams. Optionally, the subcommand will be +able to write to a file instead. + +By default, for reproducibility reasons, the archiving format will be a tarball. +Options will be added to produce other archive types, as supported by `just-mr`. |