1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
|
Importing objects to CAS
========================
Motivation
----------
Roots in `just` builds are typically Git trees. Such artifacts should ideally
be available already in CAS to avoid any expensive fetches. It is not uncommon
for checkouts or other pre-fetched data to be present locally and to want to
quickly integrate them into a multi-repository build.
A typical approach is to pack the process that generates the root directory
and all its content into a script (or a sequence of commands) and place it as
the fetch command of a `git tree` repository inside the multi-repository
configuration file provided to `just-mr`. This however comes with two
inefficiencies: firstly, the user has to specify a priori the expected Git
tree identifier of the root, and secondly, if the tree is not already cached
then `just-mr` will have to (re)run the (usually expensive) fetch command and
hash the generated directory in order to store the root in its Git cache.
Implemented Proposal
--------------------
We propose a new `just add-to-cas` subcommand which takes in a filesystem
path and provides its Git hash to standard output. A corresponding
entry is also added to the local (file, executable, or tree) CAS. While
hashing directories into Git trees is the most important use-case, the
subcommand will allow the hashing of blobs as well.
Note that `just-mr` by default looks both into the local CAS and the Git cache
for any blobs or trees it might need to make available. In particular, build
roots with trees already in local CAS would only need to be additionally
imported into the Git cache, which is a cheap local operation.
### CAS locations
`just` provides a default local build root location for where the local CAS
should be stored. However, a useful use-case of this new subcommand would be
to populate a specific (and possibly existing) CAS. Therefore an option to
specify the local build root will be available.
### Remote CAS
Similarly, we want to be able to populate a remote CAS as well, e.g., to
prepare roots for a future build. For this reason, the subcommand will allow
to specify a remote-execution endpoint, with the understanding that the
generated local CAS entry should be synced with the remote. Additionally,
we will support computing hashes also in compatible mode, set via an
appropriate option.
### Symbolic links
If the given path points to a symbolic link, by default the
link content will be added to the CAS as blob. If the the option
`--follow-symlinks` is given, the argument specifying what to add
will be resolved and the object pointed to will be added to CAS.
If the given path points to a directory, non-upwards symbolic links
will be accepted and added to the tree object. Adding directories
to CAS is only suppored in native mode.
### Other notes
The only mandatory argument is the path to the filesystem object to be hashed,
with all other options optional, as sensible defaults exist already in `just`.
Logging options will also be available for this subcommand.
While the main purpose of this subcommand is to add the hashed Git object to
CAS, and considering that hashing a directory is a non-trivial operation for
the command-line `git` tool, a pure computation of the hashes _without_
generating a CAS entry might still be of interest and available as an option.
This is, however, not useful in typical situations.
### Auxillary change: `just install-cas --archive`
Build sources many times come in the form of archives, which allow efficient
storage and backup, and permit easy versioning of build dependencies. Also
build artifacts, for example resulting binaries (together with required headers
when building statically), are typically shipped as archives.
Therefore, the `just install-cas` command has an option to dump
an artifact, that has to be a tree, as an archive instead of as a
directory (or pretty-printed top-level contents). Remote CAS options
and file location (with stdout as default) are honored.
Auxiliary changes still to be implemented
-----------------------------------------
### `just add-to-cas` to support symlink resolving inside directories
To mirror options available in `just-mr` repository descriptions,
we will also support options to either ignore, or fully resolve
symlinks in the generated Git trees.
### `just install-cas --archive` to support different archive types
By default, for reproducibility reasons, the archiving format will be a tarball.
Options will be added to produce other archive types, as supported by `just-mr`.
|