Age | Commit message (Collapse) | Author |
|
As we use chunking also for reducing storage, we have to consider
the overhead of block devices which is in the order of kB per file.
So our target chunk size should be at least 2 orders of magnitude
above this. This suggests to minimally aim for a chunk size of
128kB, a target size that also has the advantage the that maximal
chunk size associated with this size is 1MB which is still well
below the maximal transmission size of grpc allowing us to avoid
the streaming API.
As we're scaling everything up by a factor of 16, we also have
to increase the number of bits in the involved masks by 4. We use
this to also extend the window size by using the 2 most significant
octets. Following the advice of the paper proposing FastCDC to
spread out the ones roughly equally suggests 0x4444 as a suitable
value for the two most significant octets.
We also change the suggested extension of the remote-execution API
accordingly. As the precise parameters for FastCDC when announced
over the remote-execution APIs are still under discussion upstream,
we simplify the name to not mention the target size.
|
|
... as the typical chunk size is mainly determined by the masks
used internally. So, as long as we hard code them, we should be
honest to ourselves and accept that the chunking parameters are
hard-coded as well.
|
|
... as this is the only thing the user cares about when trying
to investigate why that action failed.
|
|
... with only the non-null entries of the configuration. This
information is enough for the user to build this target, e.g.,
when searching for the cause of a build failure.
|
|
... by removing from an object the outer keys where the value is null.
|
|
As `just serve` is used like a daemon it can be desirable to restrict
stderr, e.g., to only errors, while keeping a detailled log of the
activity in a file.
|
|
|
|
|
|
|
|
Messages on the command line can be more disturbing than, e.g.,
in a log file. In particular, for debugging it often is useful
to have very verbose logs. In order to have the command-line
experience manageable also in this cases, support restricting the
command-line logging further. In this way, while interacting with
concise command-line messages, verbose logs are still written for
later analysis.
|
|
... to simplify set ups where configuration files are
provided as symbolic links to some central store.
|
|
|
|
The association map file for a resolved tree was supposed to
guarantee that the respective tree is kept alive in a Git
repository as part of a tagged commit.
This commit fixes this issue by tagging the tree (found in the Git
cache after resolution) before writing its association file.
|
|
Also improves and extends accordingly the Git operations tests.
|
|
Also adds an appropriate test for this method.
|
|
In certain cases, e.g., on the serve endpoint, an unresolved tree
might lie in a repository other than the Git cache, therefore we
cannot create any new entries there, as it would violate our
guarantee that we only write under our local build root.
Therefore, the resolve_symlinks_map now receives pointers to both
the source and target Git databases and ensures that:
1. any tree created on-the-fly is stored exclusively in the target
repository, and
2. any other entry required for those trees is made available in
the target repository by copying it from the source repository.
Note that in our use case the target repository is always our Git
cache and passing a pointer to that object database is done to
avoid the overhead of otherwise opening the database very often.
|
|
|
|
This bug went under the radar because the returned pointer is never
explicitly used, just tested if set. As such, the correctness of
just-mr was never actually afected by it.
This commit fixes the issue and also cleans up small
inconsistencies.
|
|
Also extends the tests accordingly.
|
|
If we set up the root for an archive repository as absent, we
should first check if the serve endpoint can set it up for us, and
only then try to provide it from locally available means.
|
|
|
|
|
|
Often outputs are only referenced as blobs but not downloaded to the
working directory of the test. This can make it hard to understand
errors, as the respective artifacts are not available for inspection.
This is even more important in case of tests with a provided serve
endpoint as then even the error message of a failed serve build is
only referenced as blob. Solve this by keeping the local build root
of the remote-execution service using the fact that all objects are
transferred between the serve endpoint and the client go through
the remote-execution endpoint.
|
|
Before this patch, if the option `--remote-execution-property KEY:VAL`
is repeated multiple times (also with different `KEY`s), only the last
one is taken into account.
This patch fixes the intended behavior.
|
|
|
|
|
|
|
|
This test, among others, verifies the archive functionality by
creating an archive with our library, extractting it with the
system command-line tools, and comparing the result. In order to not
depend on the host system having installed tools for all possible
compression algorithms, it tacitly drops the extraction test if the
respective tool could not be found under /usr/bin. This, however,
assumes that /usr/bin is in path; ensure this, by extending PATH
accordingly.
|
|
... and set quadratic bounding box.
|
|
|
|
|
|
|
|
... which are, in particular, artifacts involved in staging conflicts.
While there, also make disjoint union honor the expression log limit.
|
|
Mentioning in particular the involved artifacts as well as the direct
dependencies that brought them in. Here, we are in a simple situation
as all built-in rules that check conflicts only use artifacts and
runfiles of their dependencies, but not the provided data.
Also, the built-in rules that check staging conflicts do not do
configuration transitions, hence it is enough to show the target
name of the dependencies containing the artifact if for the built-in
target we show the configuration.
|
|
operations
This test creates a "file" repository with pragma "to_git". Move to a
subdirectory to avoid including all the tools in that created root.
|
|
To avoid too many intermediate results, we compute the union of
a list in a divide and conquer fashion. Of course, for a disjoint
union, the recursive calls on the lists of half the length have to
be disjoint as well, i.e., the template parameter kDisjoint has to
be passed on. Fix this.
|
|
grpc is used in the toolchain defaults for proto servive libraries.
Still, it is typically built on its own, with its own toolchain,
flags, etc. Now, grpc, however has a public dependency on a the
rpc-status proto library, that the user may well use on their own,
however building in their own way which can yield conflicts. To solve
this, we hide the dependency on that proto library, as infrastructure
libs should not make assumptions on user-servicable libraries.
- First, we note that the dependency can be made a private one,
which already solves the conflict on header files (which will
essentially be the same, but might be defined in a different way).
- Next, we note that the library at linking basically only acts
as a default implementation; if the user provides their own
version of the rpc-status library, we should prefer that anyway.
As infrastructure is linked last, we have that default character
anyway; the only thing to do is to rename the library that no
staging conflict occurs.
|
|
For historic reasons (as quite some tests date back till before the
public name of the build tools was decided), the end-to-end tests
assume generic names for the tools. This used to be done by simple
staging the artifacts. As soon as we started to support dynamic
linking, we also have to allow the runtime dependnecies, as provided
by our install-with-deps rule. ae2e515ab84ea3ab08764685f84441c0741f8039
attempted to add those dependencies by replacing the staging by
a generic action doing a copy. This, however, made the "lib" dir
containing the dependencies an opaque tree
- defined by different actions, and, more importantly,
- containing only the run-time dependencies of one of the tools.
This causes staging conflicts between those two lib dirs (currently
hidden by a bug in the computation of the disjoint union) and things
only worked because in the canonical configuration used for testing
both "lib" dirs are empty anyway.
The correct way of adding dependencies while renaming the tool is
still staging; fix this.
|
|
|
|
For splicing of large objects from external sources additional checks are performed:
* The digest of the spliced result must be equal to the expected digest;
* The parts of a spliced tree must be in the storage.
Tested:
* Regular splicing of large objects;
* If the result is unexpected, splicing fails;
* If some parts of a tree are missing, splicing fails.
|
|
This is needed for LocalCAS's splice routines.
|
|
* Uplink parts of the large entry before entry itself;
* Uplink large entries in LargeObjectCAS::GetEntryPath to not split things two times;
* Promote spliced tree during uplinking of a large tree entry to properly promote parts of the tree;
* Uplink large entries in LocalUplink{Blob, Tree} to support proper uplinking in Action Cache and Target Cache;
Tested:
* Uplink large blobs and trees;
* Uplink a large object that depends on other large objects.
|
|
Implicitly reconstruct objects during regular uplinking of Blobs/Trees.
|
|
* Add LargeObjectCAS fields for files and trees to LocalCAS;
* Add logic for splitting objects located in the main storage.
Tested:
Splitting of large, small and empty objects.
|
|
|
|
|
|
Every large object is keyed by the hash of the result and contains hashes of the parts from which the result can be reconstructed.
|
|
|
|
|
|
With "remote-execution properties" a primary property, there is no need
anymore to repeat the property as part of the "just args".
|