diff options
author | Sascha Roloff <sascha.roloff@huawei.com> | 2023-11-22 18:33:45 +0100 |
---|---|---|
committer | Sascha Roloff <sascha.roloff@huawei.com> | 2023-11-23 11:08:46 +0100 |
commit | b7b438d5bba82a391d581537d144c2af6ac233ac (patch) | |
tree | 4b8e4919ebb4fad31861d7095c959d8fdca29de2 /doc/future-designs | |
parent | 957b010bb06509ec2b7db2c79836791684e8bc04 (diff) | |
download | justbuild-b7b438d5bba82a391d581537d144c2af6ac233ac.tar.gz |
Move blob-splitting design document to implemented concepts
Diffstat (limited to 'doc/future-designs')
-rw-r--r-- | doc/future-designs/blob-splitting.md | 222 |
1 files changed, 0 insertions, 222 deletions
diff --git a/doc/future-designs/blob-splitting.md b/doc/future-designs/blob-splitting.md deleted file mode 100644 index 7bfc53c4..00000000 --- a/doc/future-designs/blob-splitting.md +++ /dev/null @@ -1,222 +0,0 @@ -Blob Splitting Protocol -======================= - -Introduction ------------- - -Due to the generic nature of our build system, we do not impose any -restrictions on the type of artifacts that are generated by actions or -that are inputs to actions. Thus, artifacts might be large executables, -libraries, or a compiled set of slides in PDF format, or even whole file -system images. Such artifacts result in large blobs, which need to be -fetched from a remote CAS if they are to be inspected or used locally. -Depending on the network connection, this might imply a significant -waiting time until the complete artifact is downloaded as well as -results in a lot of network traffic. This situation is especially -painful in case of short modification-inspection cycles. For each small -modification of the sources, the complete artifact needs to be -downloaded even though maybe only a small fraction of the compiled -binary artifact has been changed. - -Thus, we propose a blob splitting strategy as conservative extension to -the original [remote execution -API](https://github.com/bazelbuild/remote-apis/blob/main/build/bazel/remote/execution/v2/remote_execution.proto), -which allows to split the binary data of a blob into chunks and to -transmit only the modified chunks instead of the whole blob data. The -goal is to reduce traffic between local host and remote server and to -avoid unnecessary waiting times for users frequently working on large -artifacts at the cost of an increased storage consumption. - -Remote execution API extension ------------------------------- - -We extend the existing remote execution API by introducing a new remote -procedure call (rpc) to the `ContentAddressableStorage` service to split -a blob into chunks along with respective request and response messages, -and by adding a flag to the `ServerCapabilities` message to indicate -whether blob splitting is supported by the server or not. The following -code snippet depicts the proposed extensions to the remote execution -protocol: - - service ContentAddressableStorage { - ... - rpc SplitBlob(SplitBlobRequest) returns (SplitBlobResponse) {} - } - - message SplitBlobRequest { - string instance_name = 1; - Digest blob_digest = 2; - } - - message SplitBlobResponse { - repeated Digest chunk_digests = 1; - } - - message ServerCapabilities { - ... - bool blob_split_support = <free field number>; - } - -### Contract between client and server - -The contract between the client and the server for this protocol -extension is that if a client requests to split a blob at the remote -execution server, the server answers with a list of chunk digests and -gives the promise that first, all referenced blobs are available in its -CAS to be downloaded and second, the concatenation of the returned blobs -in the given order will result in the original blob the client has -requested. - -The client does not give any promise to the server, it is free to not -use the blob splitting rpc, but in order to make sense of the protocol -extension (saving traffic), a meaningful behavior of the client would be -to request the chunk digests of a blob from the server, to fetch only -those chunks that are missing in its local CAS, and to store the fetched -chunks as well as the reconstructed original blob in its local CAS. If -the client requests to split a blob, but the server does not support -blob splitting or if an error occured during the request, the client -falls back to fetch the entire blob. - -Blob split procedure --------------------- - -### Server side - -When the server receives a request to split a blob via the `SplitBlob` -rpc, it first checks whether the blob given by the `blob_digest` from -the `SplitBlobRequest` message exists in its CAS. If not, the status -code `google.rpc.Code.NOT_FOUND` is returned. Otherwise, it loads the -blob data and splits it into chunks according to the implemented data -chunking algorithm. As explained later, there are different -implementation options for this algorithm, but they all must ensure one -property: the concatenation of the chunks result in the original blob. -After the blob is splitted, the server computes a digest for each chunk -according to the configured digest function and puts each chunk that is -not yet stored in its CAS. If an error occurs during storing the chunks -in the CAS due to storage shortage, the status code -`google.rpc.Code.RESOURCE_EXHAUSTED` is returned. Otherwise, the chunk -digests are collected in the `chunk_digests` field of a -`SplitBlobResponse` message in the same order as the chunks occur within -the binary blob data and the message is sent back to the client with the -status code `google.rpc.Code.OK`. - -> Note: Since the same data is basically stored twice for each blob, the -> storage consumption for the remote server is roughly doubled. Some -> savings occur when chunks are reused in more than one blob or even -> several times in the same blob. - -### Client side - -If a client wants to take advantage from blob splitting, it requests to -split a blob into chunks at the remote side by calling the `SplitBlob` -rpc from the `ContentAddressableStorage` service given a -`SplitBlobRequest` message containing the `blob_digest` of the -respective blob. If the status code returned by the `SplitBlob` rpc -contains an error, the split operation failed and the client needs to -fall back to fetch the entire blob. Otherwise, blob splitting was -successful and the remote server returns an ordered list of -`chunk_digests` and guarantees that the chunks are available in its CAS -and that the concatention of all chunks in the given order will result -in the requested blob. The client checks which chunks are available -locally, fetches only the locally missing chunks from the remote side -via the `BatchReadBlobs` rpc or the `ByteStream` API, and assembles the -requested blob by concatenating the binary data of its chunks. Finally, -it stores the chunks as well as the resulting blob in its local CAS. - -> Note: Since the same data is basically stored twice for each blob, the -> local storage consumption is roughly doubled. Some savings occur when -> chunks are reused in more than one blob or even several times in the -> same blob. - -Compatibility issues --------------------- - -We aim to get this protocol extension into the official remote execution -API mentioned above. Once this is done and field numbers are assigned, -servers that have upgraded to the new API version inform clients that -have upgraded to the new API version whether they support blob splitting -or not by setting the `blob_split_support` field in the -`ServerCapabilities` message accordingly. A server only provides the -aforementioned guarantee to clients once it has announced to support -blob splitting using this field. - -An old client can communicate with a server implementing the new API -without any modification. Nobody is forced to use the blob splitting -functionality and unknown fields are just ignored at the client side. A -client implementing the new API can communicate with an old server by -evaluating the `blob_split_support` field, which will be set to its -default value `false` at the client side. - -Until the protocol extension is officially accepted, the field number -for the `blob_split_support` field is not known. In this case, early -implementors can use a _trial request_ to determine whether the remote -server supports blob splitting or not. This allows to implement a -prototypical client and server employing blob splitting without -requiring the protocol extension to be officially accepted. It also -allows new clients to communicate with old servers, they still behave -correctly and do not need to be adapted. The client implementation needs -to be rebuilt once a field number has been assigned. - -Data chunking algorithm ------------------------ - -As stated earlier, a blob split request from a client is answered by the -server with the promise that the concatenation of the returned chunks -results in the requested blob. While this property is guaranteed by the -server, it does not tell anything about the quality of the split result. -It is desirable to generate chunks that are likely to be known by the -client, because then less data needs to be transferred. - -A trivial implementation of the split operation would be returning a -_singleton list_ containing the original blob as single chunk. While -this implementation is correct and fast, it is not very useful, since it -does not save any traffic. - -A better approach is _fixed-block chunking_ where the data is split into -chunks of fixed and equal size. While this approach typically results in -more than one chunk and improves the probability to save some traffic, -it comes with the limitation of the data shifting problem. The insertion -of a single character at the beginning of a data stream shifts the -entire data while chunk boundaries are fixed. Thus, completely new -chunks, unknown to the client are created even though the data patterns -are mostly similar. - -A more intelligent way of splitting blobs would be to look for locations -in the content of the data as chunk boundaries such that splitting of a -slightly modified blob results in almost similar chunks insensitive to -the data shifting problem of fixed-block chunking. Since the resulting -chunk sizes are typically different, this approach is also called -_variable-block chunking_ and is explained in the following section. - -### Variable-block chunking - -In variable-block chunking, the borders of the chunks are content -defined and shifted with the data pattern. This can be achieved by -computing hash values of data blocks of a specific size at every byte -position in the data stream and matching those hash values against a -predefined pattern. In case of a match, that byte position is declared -as chunk boundary. This approach avoids the data shifting problem of -fixed-block chunking and ensures that unmodified data more than a block -size away from the changes will have the same boundaries. A possible -pattern to be matched might be all zeroes in the lower `n` bits. This -would result in chunks with an average size of `2^n` bytes. - -### Rolling hashes - -A common technique to efficiently compute hash values of consecutive -bytes at every byte position in the data stream is to use a rolling hash -function. A rolling hash function allows to cheaply compute the hash -value of a chunk of bytes of a specific size at byte position `i` from -the hash value of the data chunk of the same size at byte position -`i-1`. Different plain rolling hash functions are available for -implementation such as a [polynomial rolling -hash](https://ieeexplore.ieee.org/document/5390135), the [Rabin -fingerprint](http://www.cs.cmu.edu/~15-749/READINGS/optional/rabin1981.pdf), -or a [cyclic polynomial rolling -hash](https://dl.acm.org/doi/abs/10.1145/256163.256168) (also called -Buzhash). However, the [fast rolling Gear hash -algorithm](https://www.usenix.org/conference/atc16/technical-sessions/presentation/xia) -(also called FastCDC) has been proven to be very compute efficient and -faster than the other rolling hash algorithms specifically for -content-based chunking of a stream of data while achieving similar -deduplication ratios as the Rabin fingerprint. |