Age | Commit message (Collapse) | Author |
|
|
|
|
|
The specification for this status code is as follows.
One or more errors occurred in setting up the action requested,
such as a missing input or command or no worker being available.
The client may be able to fix the errors and retry.
We routinely ensure all inputs are available to the remote execution
before we start an action, so all prerequisites will be there on a
compliant server, however might not actually be on a server where
the CAS only has eventual consistency or is incorrect (due to old
cache entries on CAS purge) in its answer to FindMissingBlobs.
While we have no guarantee that a retry will help, we still retry;
at least in the case of an unavailable worker or CAS entries not
yet available due to eventual consistency, this will help. Also,
we log at debug lvel the full response, including the repeated Any
message. In this way, we can find out what useful information (if
any) is sent by popular remote-execution services and implement
more specific mitigations in the future.
|
|
In BatchUploadBlobs we accept short writes and, in case of no progress,
fall back to single blob upload. Therefore, failure to upload blobs
is not fatal and therefore should not be reported at error level.
Decrease the log level accordingly: a protocol failure to upload is
a performance-related event (as the retry needs additional time),
catching an internal exception is something that shouldn't really
happen, so we warn the user.
|
|
... instead of relying on those dependencies being pulled in
indirectly.
|
|
The rpc Execution::Execute returns stream
google.longrunning.Operation. When the client reads the stream, the
server can report that the operation is still in progress and the
client has to wait. Before this patch, we were not checking for this
particular condition. As a result, an ongoing action was interpreted
as an execution failure.
|
|
|
|
|
|
|
|
|
|
...since we use recursion for trees a lot, but skip this check manually.
|
|
|
|
|
|
|
|
Enable performance-enum-size check.
|
|
Enable performance-no-automatic-move check.
|
|
...proposed by clang-tidy.
Enable bugprone-assignment-in-if-condition check.
|
|
|
|
Despite the fact that HashFunction is a small type, it still makes sense to store it by reference to reflect the ownership. StorageConfig becomes the main holder.
Reference holders store HashFunction by const ref and aren't allowed to change it. However, they are free to return HashFunction by value since this doesn't benefit readability anyhow.
|
|
Although this change doesn't benefit performance anyhow (protobuf's mutable_*() methods allocate memory lazily), it is better to let protobuf do this on its own.
|
|
...and remove split serialization/deserialization logic.
|
|
...and remove split serialization/deserialization implementations.
|
|
...and use the qualified name ByteStreamUtils::kChunkSize
|
|
...since they were used only in tests.
|
|
|
|
- add more noexcept requirements and enforce existing
- fixing inconsistencies related to function arguments
- remove redundant static keywords
- silencing excessive lint reporting in test cases
While there, make more getters const ref.
|
|
Invalid entries, currently all upwards symlinks (pending
implementation of a better way of handling them), are now
identified and handled properly: in compatible mode on the client
side, during handling of remote response, and in native mode on
the server side during the population of the action result.
|
|
...during WithRetry, as we know that the error message of the
callable has already been appropriately logged.
|
|
|
|
As populating the containers from remote response only takes place
once, no assumptions should be made that this cannot fail (for
example if wrong or invalid entries were produced). Instead, return
error messages on failure to callers that can log accordingly.
|
|
...to determine whether splitting-splicing functionality is supported.
|
|
|
|
|
|
|
|
...and move it to the common stage.
|
|
|
|
|
|
|
|
|
|
...with ArtifactDigestFactory::HashDataAs
|
|
...from ObjectInfo and ArtifactDigest
|
|
...to simplify further refactoring.
|
|
...bypassing ArtifactDigest functionality.
|
|
...with ArtifactDigest.
|
|
|
|
...with ArtifactDigest.
|
|
Remote execution of actions is handled via long-running operations.
Here we have to be careful with the involved status codes: there
is the status code of the operation and the response contains a
faild that also happens to be a status code. The protocol states
Errors discovered during creation of the `Operation` will be
reported as gRPC Status errors, while errors that occurred while
running the action will be reported in the `status` field of
the `ExecuteResponse`
So we have to distinguish between two kinds of DEADLINE_EXCEEDED.
- If reported by the rpc, it means, we failed to obtain the status
of the ongoing action in a reasonable amount of time; here we
can do nothing but retry.
- If we obtain an answer and that answer has state DEADLINE_EXCEEDED
this means "The execution timed out."; hence we must not retry
and report the result properly to the user.
|
|
We already accept short writes in batch uploads, but when no
progress is made, we cannot simply retry, as this might lead to
an infinite loop. Instead, we give up on batching and upload each
blob one by one.
|
|
The remote execution protocol is a bit unclear about how to
deal with blob updates for which we got no response. While
some clients consider a blob update failed only if a failed
response is received, we are going extra defensive here and
also consider missing responses to be a failed blob update.
Issue a retry for the missing blobs.
|
|
... that was accidentially replaced by a first-wins semantics in
62d204ff4cc94c12c1635f189255710901682825 which fortunately did not
make it to any release.
|