summaryrefslogtreecommitdiff
path: root/doc/future-designs/cas-objects-import.md
blob: 1fc061f8a432d49b0988be49c0e2fee28af75c0b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
Importing objects to CAS
========================

Motivation
----------

Roots in `just` builds are typically Git trees. Such artifacts should ideally
be available already in CAS to avoid any expensive fetches. It is not uncommon
for checkouts or other pre-fetched data to be present locally and to want to
quickly integrate them into a multi-repository build.

A typical approach is to pack the process that generates the root directory
and all its content into a script (or a sequence of commands) and place it as
the fetch command of a `git tree` repository inside the multi-repository
configuration file provided to `just-mr`. This however comes with two
inefficiencies: firstly, the user has to specify a priori the expected Git
tree identifier of the root, and secondly, if the tree is not already cached
then `just-mr` will have to (re)run the (usually expensive) fetch command and
hash the generated directory in order to store the root in its Git cache.

Proposal
--------

We propose a new `just add-to-cas` subcommand which takes in a filesystem
path and provides its Git hash to standard output. A corresponding
entry is also added to the local (file, executable, or tree) CAS. While
hashing directories into Git trees is the most important use-case, the
subcommand will allow the hashing of blobs as well.

Note that `just-mr` by default looks both into the local CAS and the Git cache
for any blobs or trees it might need to make available. In particular, build
roots with trees already in local CAS would only need to be additionally
imported into the Git cache, which is a cheap local operation.

### CAS locations

`just` provides a default local build root location for where the local CAS
should be stored. However, a useful use-case of this new subcommand would be
to populate a specific (and possibly existing) CAS. Therefore an option to
specify the local build root will be available.

### Remote CAS

Similarly, we want to be able to populate a remote CAS as well, e.g., to
prepare roots for a future build. For this reason, the subcommand will allow
to specify a remote-execution endpoint, with the understanding that the
generated local CAS entry should be synced with the remote. Additionally,
we will support computing hashes also in compatible mode, set via an
appropriate option.

### Symbolic links

If the given path points to a symbolic link, by default the link will be
followed and we will hash the object the symlink resolves to. However, we will
also support, via an appropriate option, to hash the symlink content as-is
instead.

If the given path points to a directory, the treatment of contained symbolic
links will default to allowing only non-upwards symlinks. To mirror options
available in `just-mr` repository descriptions, we will also support options
to either ignore, or fully resolve symlinks in the generated Git trees.

### Other notes

The only mandatory argument is the path to the filesystem object to be hashed,
with all other options optional, as sensible defaults exist already in `just`.

Logging options will also be available for this subcommand.

While the main purpose of this subcommand is to add the hashed Git object to
CAS, and considering that hashing a directory is a non-trivial operation for
the command-line `git` tool, a pure computation of the hashes _without_
generating a CAS entry might still be of interest and available as an option.
This is, however, not useful in typical situations.

Auxiliary changes
-----------------

### `just-mr` to support `archive` subcommand

Build sources many times come in the form of archives, which allow efficient
storage and backup, and permit easy versioning of build dependencies. Also
build artifacts, for example resulting binaries (together with required headers
when building statically), are typically shipped as archives.

The `just-mr` tool will therefore implement a new subcommand `archive` which,
given a tree identifier, with the understanding that the tree is available
locally (either in local CAS or the Git cache), will produce an archive from
the content of that tree.

The archive content will be written to standard output, thus allowing the usual
piping and redirection of binary streams. Optionally, the subcommand will be
able to write to a file instead.

By default, for reproducibility reasons, the archiving format will be a tarball.
Options will be added to produce other archive types, as supported by `just-mr`.