summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-04-12file chunker: increase chunk sizesKlaus Aehlig
As we use chunking also for reducing storage, we have to consider the overhead of block devices which is in the order of kB per file. So our target chunk size should be at least 2 orders of magnitude above this. This suggests to minimally aim for a chunk size of 128kB, a target size that also has the advantage the that maximal chunk size associated with this size is 1MB which is still well below the maximal transmission size of grpc allowing us to avoid the streaming API. As we're scaling everything up by a factor of 16, we also have to increase the number of bits in the involved masks by 4. We use this to also extend the window size by using the 2 most significant octets. Following the advice of the paper proposing FastCDC to spread out the ones roughly equally suggests 0x4444 as a suitable value for the two most significant octets. We also change the suggested extension of the remote-execution API accordingly. As the precise parameters for FastCDC when announced over the remote-execution APIs are still under discussion upstream, we simplify the name to not mention the target size.
2024-04-12file chunker: remove average chunk size from interfaceKlaus Aehlig
... as the typical chunk size is mainly determined by the masks used internally. So, as long as we hard code them, we should be honest to ourselves and accept that the chunking parameters are hard-coded as well.
2024-04-11Error reporting on action failure: give short target nameKlaus Aehlig
... as this is the only thing the user cares about when trying to investigate why that action failed.
2024-04-11configured target: support short representationKlaus Aehlig
... with only the non-null entries of the configuration. This information is enough for the user to build this target, e.g., when searching for the cause of a build failure.
2024-04-11json: support pruningKlaus Aehlig
... by removing from an object the outer keys where the value is null.
2024-04-10Support stderr log-limit restriciton for serveKlaus Aehlig
As `just serve` is used like a daemon it can be desirable to restrict stderr, e.g., to only errors, while keeping a detailled log of the activity in a file.
2024-04-10Add end-to-end test verifying restricted console loggingKlaus Aehlig
2024-04-10Add just-mr command-line option to estrict log limit on stderrKlaus Aehlig
2024-04-10Add command-line option to restrict log limit on stderrKlaus Aehlig
2024-04-10log_sink_cmdline: support restricted log limitKlaus Aehlig
Messages on the command line can be more disturbing than, e.g., in a log file. In particular, for debugging it often is useful to have very verbose logs. In order to have the command-line experience manageable also in this cases, support restricting the command-line logging further. In this way, while interacting with concise command-line messages, verbose logs are still written for later analysis.
2024-04-10just serve: follow symlinks to the serve configKlaus Aehlig
... to simplify set ups where configuration files are provided as symbolic links to some central store.
2024-04-10Add test for resolve_symlinks_mapPaul Cristian Sarbu
2024-04-10just-mr: Ensure resolved trees are kept alive in Git cachePaul Cristian Sarbu
The association map file for a resolved tree was supposed to guarantee that the respective tree is kept alive in a Git repository as part of a tagged commit. This commit fixes this issue by tagging the tree (found in the Git cache after resolution) before writing its association file.
2024-04-10Add KeepTree to critical Git operationsPaul Cristian Sarbu
Also improves and extends accordingly the Git operations tests.
2024-04-10GitRepo: Add method to keep tree alive by taggingPaul Cristian Sarbu
Also adds an appropriate test for this method.
2024-04-10resolve_symlinks_map: Allow separate source and target repositoriesPaul Cristian Sarbu
In certain cases, e.g., on the serve endpoint, an unresolved tree might lie in a repository other than the Git cache, therefore we cannot create any new entries there, as it would violate our guarantee that we only write under our local build root. Therefore, the resolve_symlinks_map now receives pointers to both the source and target Git databases and ensures that: 1. any tree created on-the-fly is stored exclusively in the target repository, and 2. any other entry required for those trees is made available in the target repository by copying it from the source repository. Note that in our use case the target repository is always our Git cache and passing a pointer to that object database is done to avoid the overhead of otherwise opening the database very often.
2024-04-10content_git_map: Remove redundant opening of Git cachePaul Cristian Sarbu
2024-04-10import_to_git_map: Fix wrong pointer in setterPaul Cristian Sarbu
This bug went under the radar because the returned pointer is never explicitly used, just tested if set. As such, the correctness of just-mr was never actually afected by it. This commit fixes the issue and also cleans up small inconsistencies.
2024-04-10git_repo: Add blob writer methodPaul Cristian Sarbu
Also extends the tests accordingly.
2024-04-10content_git_map: Reorder logic for setting up absent rootsPaul Cristian Sarbu
If we set up the root for an archive repository as absent, we should first check if the serve endpoint can set it up for us, and only then try to provide it from locally available means.
2024-04-10test: Extend GitRepo methods checksPaul Cristian Sarbu
2024-04-10git_repo: Improve error message for CreateTreePaul Cristian Sarbu
2024-04-10tests with infrastructure: keep remote build rootKlaus Aehlig
Often outputs are only referenced as blobs but not downloaded to the working directory of the test. This can make it hard to understand errors, as the respective artifacts are not available for inspection. This is even more important in case of tests with a provided serve endpoint as then even the error message of a failed serve build is only referenced as blob. Solve this by keeping the local build root of the remote-execution service using the fact that all objects are transferred between the serve endpoint and the client go through the remote-execution endpoint.
2024-04-10bugfix: cli: remote-execution-property: allow for accumulating multiple pairs.Alberto Sartori
Before this patch, if the option `--remote-execution-property KEY:VAL` is repeated multiple times (also with different `KEY`s), only the last one is taken into account. This patch fixes the intended behavior.
2024-04-08Use properly included standard library types by defaultPaul Cristian Sarbu
2024-04-08Consistently guard all POSIX C includesPaul Cristian Sarbu
2024-04-08test: Add missing includes and fix depsPaul Cristian Sarbu
2024-04-08Test ["utils/archive", "archive_usage"]: fix assumptions on pathKlaus Aehlig
This test, among others, verifies the archive functionality by creating an archive with our library, extractting it with the system command-line tools, and comparing the result. In order to not depend on the host system having installed tools for all possible compression algorithms, it tacitly drops the extraction test if the respective tool could not be found under /usr/bin. This, however, assumes that /usr/bin is in path; ensure this, by extending PATH accordingly.
2024-04-08doc: Remove logo frame for better visibilityOliver Reiche
... and set quadratic bounding box.
2024-04-08doc: Add Justbuild logoOliver Reiche
2024-04-05Add test verifying origin reporting in case of conflictsKlaus Aehlig
2024-04-05User-defined rules: annotate revelant objects relevant to evaluation errors.Klaus Aehlig
2024-04-05Evaluator: Add infrastructure to annotate relevant objectsKlaus Aehlig
... which are, in particular, artifacts involved in staging conflicts. While there, also make disjoint union honor the expression log limit.
2024-04-05built-in rules: describe staging conflict in more detailKlaus Aehlig
Mentioning in particular the involved artifacts as well as the direct dependencies that brought them in. Here, we are in a simple situation as all built-in rules that check conflicts only use artifacts and runfiles of their dependencies, but not the provided data. Also, the built-in rules that check staging conflicts do not do configuration transitions, hence it is enough to show the target name of the dependencies containing the artifact if for the built-in target we show the configuration.
2024-04-05Test ["end-to-end/built-in-rules","export_counting"]: avoid unnecessary IO ↵Klaus Aehlig
operations This test creates a "file" repository with pragma "to_git". Move to a subdirectory to avoid including all the tools in that created root.
2024-04-05bug fix in expresion, Union: propagate the disjointness propertyKlaus Aehlig
To avoid too many intermediate results, we compute the union of a list in a divide and conquer fashion. Of course, for a disjoint union, the recursive calls on the lists of half the length have to be disjoint as well, i.e., the template parameter kDisjoint has to be passed on. Fix this.
2024-04-05grpc: hide dependency on google_rpc_statusKlaus Aehlig
grpc is used in the toolchain defaults for proto servive libraries. Still, it is typically built on its own, with its own toolchain, flags, etc. Now, grpc, however has a public dependency on a the rpc-status proto library, that the user may well use on their own, however building in their own way which can yield conflicts. To solve this, we hide the dependency on that proto library, as infrastructure libs should not make assumptions on user-servicable libraries. - First, we note that the dependency can be made a private one, which already solves the conflict on header files (which will essentially be the same, but might be defined in a different way). - Next, we note that the library at linking basically only acts as a default implementation; if the user provides their own version of the rpc-status library, we should prefer that anyway. As infrastructure is linked last, we have that default character anyway; the only thing to do is to rename the library that no staging conflict occurs.
2024-04-05end-to-end tests: fix tool set upKlaus Aehlig
For historic reasons (as quite some tests date back till before the public name of the build tools was decided), the end-to-end tests assume generic names for the tools. This used to be done by simple staging the artifacts. As soon as we started to support dynamic linking, we also have to allow the runtime dependnecies, as provided by our install-with-deps rule. ae2e515ab84ea3ab08764685f84441c0741f8039 attempted to add those dependencies by replacing the staging by a generic action doing a copy. This, however, made the "lib" dir containing the dependencies an opaque tree - defined by different actions, and, more importantly, - containing only the run-time dependencies of one of the tools. This causes staging conflicts between those two lib dirs (currently hidden by a bug in the computation of the disjoint union) and things only worked because in the canonical configuration used for testing both "lib" dirs are empty anyway. The correct way of adding dependencies while renaming the tool is still staging; fix this.
2024-04-02LargeBlobs: Use LocalCAS methods to implement split-splice logic of CASUtils.Maksim Denisov
2024-04-02LargeBlobs: Splice large objects from external sources.Maksim Denisov
For splicing of large objects from external sources additional checks are performed: * The digest of the spliced result must be equal to the expected digest; * The parts of a spliced tree must be in the storage. Tested: * Regular splicing of large objects; * If the result is unexpected, splicing fails; * If some parts of a tree are missing, splicing fails.
2024-04-02ObjectCAS: Move the method for calculating file digests to the public space.Maksim Denisov
This is needed for LocalCAS's splice routines.
2024-04-02LargeBlobs: Uplink large objects.Maksim Denisov
* Uplink parts of the large entry before entry itself; * Uplink large entries in LargeObjectCAS::GetEntryPath to not split things two times; * Promote spliced tree during uplinking of a large tree entry to properly promote parts of the tree; * Uplink large entries in LocalUplink{Blob, Tree} to support proper uplinking in Action Cache and Target Cache; Tested: * Uplink large blobs and trees; * Uplink a large object that depends on other large objects.
2024-04-02LargeBlobs: Splice large objects implicitly.Maksim Denisov
Implicitly reconstruct objects during regular uplinking of Blobs/Trees.
2024-04-02LargeBlobs: Split large objects.Maksim Denisov
* Add LargeObjectCAS fields for files and trees to LocalCAS; * Add logic for splitting objects located in the main storage. Tested: Splitting of large, small and empty objects.
2024-04-02LargeObjectCAS: Implement auxiliary class for storing error information.Maksim Denisov
2024-04-02LargeObjectUtils: Randomize large files and directories for testing purposesMaksim Denisov
2024-04-02LargeObjectCAS: Store large objects.Maksim Denisov
Every large object is keyed by the hash of the result and contains hashes of the parts from which the result can be reconstructed.
2024-04-02Move file chunker to storage.Maksim Denisov
2024-04-02just-serve-config(5): add an example of a configurationKlaus Aehlig
2024-04-02just-mrrc(5): update exampleKlaus Aehlig
With "remote-execution properties" a primary property, there is no need anymore to repeat the property as part of the "just args".