Age | Commit message (Collapse) | Author |
|
... instead of relying on those dependencies being pulled in
indirectly.
|
|
|
|
|
|
|
|
|
|
|
|
...instead of calling ProtocolTraits::IsCompatible
|
|
...and move it to the common stage.
|
|
|
|
...with ArtifactDigestFactory::HashDataAs
|
|
...with ArtifactDigestFactory::HashFileAs
|
|
...with ArtifactDigest.
|
|
...with ArtifactDigest.
|
|
...with ArtifactDigest.
|
|
...with ArtifactDigest.
|
|
...and move this functionality to bazel_msg_factory_test, where it is actually used.
For local_cas.test the regular hashing is used, since blob_creator is redundant there.
|
|
... while keeping our .clang-format file.
|
|
... so that linting information gets propagated properly.
|
|
|
|
|
|
|
|
|
|
...and create StorageConfig and Storage in place if needed.
|
|
|
|
... instead of static calls to GarbageCollector
|
|
...instead of std::filesystem::path.
StorageConfig is extended to return paths of Storage's parts.
|
|
|
|
...to track changes during refactoring easier.
|
|
|
|
|
|
|
|
|
|
|
|
During compactification, invalid entries must be deleted.
|
|
During garbage collection split and remove from the storage every entry that is larger than a threshold.
|
|
During garbage collection remove from the storage every entry that has the large entry.
|
|
and trees.
|
|
executable files during splitting.
|
|
As we use chunking also for reducing storage, we have to consider
the overhead of block devices which is in the order of kB per file.
So our target chunk size should be at least 2 orders of magnitude
above this. This suggests to minimally aim for a chunk size of
128kB, a target size that also has the advantage the that maximal
chunk size associated with this size is 1MB which is still well
below the maximal transmission size of grpc allowing us to avoid
the streaming API.
As we're scaling everything up by a factor of 16, we also have
to increase the number of bits in the involved masks by 4. We use
this to also extend the window size by using the 2 most significant
octets. Following the advice of the paper proposing FastCDC to
spread out the ones roughly equally suggests 0x4444 as a suitable
value for the two most significant octets.
We also change the suggested extension of the remote-execution API
accordingly. As the precise parameters for FastCDC when announced
over the remote-execution APIs are still under discussion upstream,
we simplify the name to not mention the target size.
|
|
|
|
For splicing of large objects from external sources additional checks are performed:
* The digest of the spliced result must be equal to the expected digest;
* The parts of a spliced tree must be in the storage.
Tested:
* Regular splicing of large objects;
* If the result is unexpected, splicing fails;
* If some parts of a tree are missing, splicing fails.
|
|
* Uplink parts of the large entry before entry itself;
* Uplink large entries in LargeObjectCAS::GetEntryPath to not split things two times;
* Promote spliced tree during uplinking of a large tree entry to properly promote parts of the tree;
* Uplink large entries in LocalUplink{Blob, Tree} to support proper uplinking in Action Cache and Target Cache;
Tested:
* Uplink large blobs and trees;
* Uplink a large object that depends on other large objects.
|
|
Implicitly reconstruct objects during regular uplinking of Blobs/Trees.
|
|
* Add LargeObjectCAS fields for files and trees to LocalCAS;
* Add logic for splitting objects located in the main storage.
Tested:
Splitting of large, small and empty objects.
|
|
|
|
This allows better separation and, in particular, repositories
needed only for tests do not have to be provided for building the
tools. This also better documents which dependencies are only needed
for testing.
|
|
... with two minor code base changes compared to previous
use of gsl-lite:
- dag.hpp: ActionNode::Ptr and ArtifactNode::Ptr are not
wrapped in gsl::not_null<> anymore, due to lack of support
for wrapping std::unique_ptr<>. More specifically, the
move constructor is missing, rendering it impossible to
use std::vector<>::emplace_back().
- utils/cpp/gsl.hpp: New header file added to implement the
macros ExpectsAudit() and EnsureAudit(), asserts running
only in debug builds, which were available in gsl-lite but
are missing in MS GSL.
|
|
tests have been updated accordingly
|
|
The improved GC implementation uses refactored storage
classes instead of directly accessing "unknown" file paths.
The required storage class refactoring is quite substantial
and outlined in the following paragraphs.
The module `buildtool/file_system` was extended by:
- `ObjectCAS`: a plain CAS implementation for
reading/writing blobs and computing digests for a given
`ObjectType`. Depending on that type, files written to the
file system may have different properties (e.g., the x-bit
set) or the digest may be computed differently (e.g., tree
digests in non-compatible mode).
A new module `buildtool/storage` was introduced containing:
- `LocalCAS`: provides a common interface for the "logical
CAS", which internally combines three `ObjectCAS`s, one
for each `ObjectType` (file, executable, tree).
- `LocalAC`: implements the action cache, which needs the
`LocalCAS` for storing cache values.
- `TargetCache`: implements the high-level target cache,
which also needs the `LocalCAS` for storing cache values.
- `LocalStorage`: combines the storage classes `LocalCAS`,
`LocalAC`, and `TargetCache`. Those are initialized with
settings from `StorageConfig`, such as the build root base
path or number of generations for the garbage collector.
`LocalStorage` is templated with a Boolean parameter
`kDoGlobalUplink`, which indicates that, on every
read/write access, the garbage collector should be used
for uplinking across all generations (global).
- `GarbageCollector`: responsible for garbage collection and
the global uplinking across all generations. To do so, it
employs instances of `LocalStorage` with `kDoGlobalUplink`
set to false, in order to avoid endless recursion. The
actual (local) uplinking within two single generations is
performed by the corresponding storage class (e.g.,
`TargetCache` implements uplinking of target cache entries
between two target cache generations etc.). Thereby, the
actual knowledge how data should be uplinked is
implemented by the instance that is responsible for
creating the data in the first place.
|