summaryrefslogtreecommitdiff
path: root/doc/future-designs/cas-objects-import.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/future-designs/cas-objects-import.md')
-rw-r--r--doc/future-designs/cas-objects-import.md96
1 files changed, 96 insertions, 0 deletions
diff --git a/doc/future-designs/cas-objects-import.md b/doc/future-designs/cas-objects-import.md
new file mode 100644
index 00000000..1fc061f8
--- /dev/null
+++ b/doc/future-designs/cas-objects-import.md
@@ -0,0 +1,96 @@
+Importing objects to CAS
+========================
+
+Motivation
+----------
+
+Roots in `just` builds are typically Git trees. Such artifacts should ideally
+be available already in CAS to avoid any expensive fetches. It is not uncommon
+for checkouts or other pre-fetched data to be present locally and to want to
+quickly integrate them into a multi-repository build.
+
+A typical approach is to pack the process that generates the root directory
+and all its content into a script (or a sequence of commands) and place it as
+the fetch command of a `git tree` repository inside the multi-repository
+configuration file provided to `just-mr`. This however comes with two
+inefficiencies: firstly, the user has to specify a priori the expected Git
+tree identifier of the root, and secondly, if the tree is not already cached
+then `just-mr` will have to (re)run the (usually expensive) fetch command and
+hash the generated directory in order to store the root in its Git cache.
+
+Proposal
+--------
+
+We propose a new `just add-to-cas` subcommand which takes in a filesystem
+path and provides its Git hash to standard output. A corresponding
+entry is also added to the local (file, executable, or tree) CAS. While
+hashing directories into Git trees is the most important use-case, the
+subcommand will allow the hashing of blobs as well.
+
+Note that `just-mr` by default looks both into the local CAS and the Git cache
+for any blobs or trees it might need to make available. In particular, build
+roots with trees already in local CAS would only need to be additionally
+imported into the Git cache, which is a cheap local operation.
+
+### CAS locations
+
+`just` provides a default local build root location for where the local CAS
+should be stored. However, a useful use-case of this new subcommand would be
+to populate a specific (and possibly existing) CAS. Therefore an option to
+specify the local build root will be available.
+
+### Remote CAS
+
+Similarly, we want to be able to populate a remote CAS as well, e.g., to
+prepare roots for a future build. For this reason, the subcommand will allow
+to specify a remote-execution endpoint, with the understanding that the
+generated local CAS entry should be synced with the remote. Additionally,
+we will support computing hashes also in compatible mode, set via an
+appropriate option.
+
+### Symbolic links
+
+If the given path points to a symbolic link, by default the link will be
+followed and we will hash the object the symlink resolves to. However, we will
+also support, via an appropriate option, to hash the symlink content as-is
+instead.
+
+If the given path points to a directory, the treatment of contained symbolic
+links will default to allowing only non-upwards symlinks. To mirror options
+available in `just-mr` repository descriptions, we will also support options
+to either ignore, or fully resolve symlinks in the generated Git trees.
+
+### Other notes
+
+The only mandatory argument is the path to the filesystem object to be hashed,
+with all other options optional, as sensible defaults exist already in `just`.
+
+Logging options will also be available for this subcommand.
+
+While the main purpose of this subcommand is to add the hashed Git object to
+CAS, and considering that hashing a directory is a non-trivial operation for
+the command-line `git` tool, a pure computation of the hashes _without_
+generating a CAS entry might still be of interest and available as an option.
+This is, however, not useful in typical situations.
+
+Auxiliary changes
+-----------------
+
+### `just-mr` to support `archive` subcommand
+
+Build sources many times come in the form of archives, which allow efficient
+storage and backup, and permit easy versioning of build dependencies. Also
+build artifacts, for example resulting binaries (together with required headers
+when building statically), are typically shipped as archives.
+
+The `just-mr` tool will therefore implement a new subcommand `archive` which,
+given a tree identifier, with the understanding that the tree is available
+locally (either in local CAS or the Git cache), will produce an archive from
+the content of that tree.
+
+The archive content will be written to standard output, thus allowing the usual
+piping and redirection of binary streams. Optionally, the subcommand will be
+able to write to a file instead.
+
+By default, for reproducibility reasons, the archiving format will be a tarball.
+Options will be added to produce other archive types, as supported by `just-mr`.