diff options
Diffstat (limited to 'doc/future-designs')
-rw-r--r-- | doc/future-designs/blob-splitting.md | 77 |
1 files changed, 37 insertions, 40 deletions
diff --git a/doc/future-designs/blob-splitting.md b/doc/future-designs/blob-splitting.md index 598996b8..7bfc53c4 100644 --- a/doc/future-designs/blob-splitting.md +++ b/doc/future-designs/blob-splitting.md @@ -45,13 +45,11 @@ protocol: message SplitBlobRequest { string instance_name = 1; - Digest digest = 2; - DigestFunction.Value digest_function = 3; + Digest blob_digest = 2; } message SplitBlobResponse { - repeated Digest digests = 1; - google.rpc.Status status = 2; + repeated Digest chunk_digests = 1; } message ServerCapabilities { @@ -85,24 +83,22 @@ Blob split procedure ### Server side When the server receives a request to split a blob via the `SplitBlob` -rpc, it first checks whether the blob given by the `digest` from the -`SplitBlobRequest` message exists in its CAS and if not, an error is -returned by sending back a `SplitBlobResponse` message with the -`status.code` field set to `google.rpc.Code.NOT_FOUND`. Otherwise, it -loads the blob data and splits it into chunks according to the -implemented data chunking algorithm. As explained later, there are -different implementation options for this algorithm, but they all must -ensure one property: the concatenation of the chunks result in the -original blob. After the blob is splitted, the server computes a digest -for each chunk according to the `digest_function` specified in the -`SplitBlobRequest` message and puts each chunk that is not yet stored in -its CAS. If an error occurs during storing the chunks in the CAS, an -error is returned by sending back a `SplitBlobResponse` message with the -`status.code` field set to `google.rpc.Code.INTERNAL`. Otherwise, the -chunk digests are collected in the `digests` field of a -`SplitBlobResponse` message in the same order as the chunks within the -blob data block and sent back to the client with `status.code` set to -`google.rpc.Code.OK`. +rpc, it first checks whether the blob given by the `blob_digest` from +the `SplitBlobRequest` message exists in its CAS. If not, the status +code `google.rpc.Code.NOT_FOUND` is returned. Otherwise, it loads the +blob data and splits it into chunks according to the implemented data +chunking algorithm. As explained later, there are different +implementation options for this algorithm, but they all must ensure one +property: the concatenation of the chunks result in the original blob. +After the blob is splitted, the server computes a digest for each chunk +according to the configured digest function and puts each chunk that is +not yet stored in its CAS. If an error occurs during storing the chunks +in the CAS due to storage shortage, the status code +`google.rpc.Code.RESOURCE_EXHAUSTED` is returned. Otherwise, the chunk +digests are collected in the `chunk_digests` field of a +`SplitBlobResponse` message in the same order as the chunks occur within +the binary blob data and the message is sent back to the client with the +status code `google.rpc.Code.OK`. > Note: Since the same data is basically stored twice for each blob, the > storage consumption for the remote server is roughly doubled. Some @@ -113,20 +109,19 @@ blob data block and sent back to the client with `status.code` set to If a client wants to take advantage from blob splitting, it requests to split a blob into chunks at the remote side by calling the `SplitBlob` -rpc from the `ContentAddressableStorage` service given a valid -`SplitBlobRequest` message containing the respective blob `digest` and -`digest_function`. If the `status` field of the returned -`SplitBlobResponse` message contains an error, the split operation -failed and the client needs to fall back to fetch the entire blob. -Otherwise, blob splitting was successful and the remote server returns -an ordered list of chunk `digests` and guarantees that the chunks are -available in its CAS and that the concatention of all chunks in the -given order will result in the requested blob. The client checks which -chunks are available locally, fetches only the locally missing chunks -from the remote side via the `BatchReadBlobs` rpc or the `ByteStream` -API, and reconstructs the requested blob by concatenating the binary -data of its chunks. Finally, it stores the chunks as well as the -resulting blob in its local CAS. +rpc from the `ContentAddressableStorage` service given a +`SplitBlobRequest` message containing the `blob_digest` of the +respective blob. If the status code returned by the `SplitBlob` rpc +contains an error, the split operation failed and the client needs to +fall back to fetch the entire blob. Otherwise, blob splitting was +successful and the remote server returns an ordered list of +`chunk_digests` and guarantees that the chunks are available in its CAS +and that the concatention of all chunks in the given order will result +in the requested blob. The client checks which chunks are available +locally, fetches only the locally missing chunks from the remote side +via the `BatchReadBlobs` rpc or the `ByteStream` API, and assembles the +requested blob by concatenating the binary data of its chunks. Finally, +it stores the chunks as well as the resulting blob in its local CAS. > Note: Since the same data is basically stored twice for each blob, the > local storage consumption is roughly doubled. Some savings occur when @@ -218,8 +213,10 @@ implementation such as a [polynomial rolling hash](https://ieeexplore.ieee.org/document/5390135), the [Rabin fingerprint](http://www.cs.cmu.edu/~15-749/READINGS/optional/rabin1981.pdf), or a [cyclic polynomial rolling -hash](https://dl.acm.org/doi/abs/10.1145/256163.256168). However, the -[fast rolling Gear hash +hash](https://dl.acm.org/doi/abs/10.1145/256163.256168) (also called +Buzhash). However, the [fast rolling Gear hash algorithm](https://www.usenix.org/conference/atc16/technical-sessions/presentation/xia) -has been proven to be very compute efficient specifically for -content-based chunking of a stream of data. +(also called FastCDC) has been proven to be very compute efficient and +faster than the other rolling hash algorithms specifically for +content-based chunking of a stream of data while achieving similar +deduplication ratios as the Rabin fingerprint. |