fix(firestore): retry transient connection errors in streaming RPCs#14635
fix(firestore): retry transient connection errors in streaming RPCs#14635bhshkh wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements a retry mechanism with exponential backoff for the getAll method in the Firestore client, along with a new isRetryableError helper function to identify transient network and gRPC errors. Feedback identifies a compilation error where the err variable is used without declaration and suggests expanding the retry logic to include the codes.ResourceExhausted status code.
| var backoff gax.Backoff = gax.Backoff{ | ||
| Initial: 100 * time.Millisecond, | ||
| Max: 60000 * time.Millisecond, | ||
| Multiplier: 1.30, | ||
| } | ||
|
|
||
| // Read and remember all results from the stream. | ||
| var resps []*pb.BatchGetDocumentsResponse | ||
| for { | ||
| resp, err := streamClient.Recv() | ||
| if err == io.EOF { | ||
| break | ||
| } | ||
| if err != nil { | ||
| return nil, err | ||
| var streamClient interface { | ||
| Recv() (*pb.BatchGetDocumentsResponse, error) | ||
| } | ||
| resps = append(resps, resp) | ||
| } | ||
|
|
||
| // Results may arrive out of order. Put each at the right indices. | ||
| docs := make([]*DocumentSnapshot, len(docNames)) | ||
| for _, resp := range resps { | ||
| var ( | ||
| indices []int | ||
| doc *pb.Document | ||
| err error | ||
| ) | ||
| switch r := resp.Result.(type) { | ||
| case *pb.BatchGetDocumentsResponse_Found: | ||
| indices = docIndices[r.Found.Name] | ||
| doc = r.Found | ||
| case *pb.BatchGetDocumentsResponse_Missing: | ||
| indices = docIndices[r.Missing] | ||
| doc = nil | ||
| default: | ||
| return nil, errors.New("firestore: unknown BatchGetDocumentsResponse result type") | ||
| } | ||
| for _, index := range indices { | ||
| if docs[index] != nil { | ||
| return nil, fmt.Errorf("firestore: %q seen twice", docRefs[index].Path) | ||
| streamClient, err = c.c.BatchGetDocuments(batchGetDocsCtx, req) |
There was a problem hiding this comment.
The variable err is used in the assignment on line 338 and later checked on line 389, but it is never declared in this scope. In the previous version of the code, it was declared using := on line 327. Since streamClient is now declared as an interface type at line 335, you must explicitly declare err before the loop or within the loop before its first use.
var err error
var backoff gax.Backoff = gax.Backoff{
Initial: 100 * time.Millisecond,
Max: 60000 * time.Millisecond,
Multiplier: 1.30,
}
for {
var streamClient interface {
Recv() (*pb.BatchGetDocumentsResponse, error)
}
streamClient, err = c.c.BatchGetDocuments(batchGetDocsCtx, req)| if st, ok := status.FromError(err); ok { | ||
| return st.Code() == codes.Unavailable | ||
| } |
There was a problem hiding this comment.
Consider also retrying on codes.ResourceExhausted if the intent is to handle transient connection and capacity issues. According to general guidelines, retrying ResourceExhausted is acceptable, though often done without backoff if shifting to a different replica. Given the exponential backoff here, it would be a safe addition.
References
- When handling a ResourceExhausted error by forwarding the request to a different replica, it is acceptable to retry immediately without a delay.
No description provided.