[SPARK-57154][INFRA] Harden notify_test_workflow against fork run/check-run race#56212
Open
viirya wants to merge 1 commit into
Open
[SPARK-57154][INFRA] Harden notify_test_workflow against fork run/check-run race#56212viirya wants to merge 1 commit into
notify_test_workflow against fork run/check-run race#56212viirya wants to merge 1 commit into
Conversation
…heck-run race
### What changes were proposed in this pull request?
This PR makes `notify_test_workflow.yml` resilient to two timing races between
the upstream `pull_request_target` notify run and the fork's CI:
1. When listing the fork's workflow runs, instead of blindly taking the most
recent run (`workflow_runs[0]`) and throwing if its `head_sha` does not match
the PR head SHA, the script now retries (up to 3 times, 3s apart) looking for
the run whose `head_sha` matches the PR head SHA. The listing endpoint orders
by most recent, so the run for the just-pushed SHA may not be registered yet
and a stale run from a previous push could be returned.
2. When resolving the `Run / Check changes` check-run id (used only to render a
Check-run view link instead of the Actions view, see SPARK-37879), a missing
check-run no longer throws. The check-run materializes later than the workflow
run, especially when the matrix is queued, so this is now best-effort: if it
cannot be found, the `Build` check is still created pointing at the Actions
run URL.
Behavior is otherwise preserved: when no runs exist at all, the `action_required`
("workflow run detection failed") check is still created; when runs exist but
none match the PR head SHA, the script still throws so a fresh notify run handles
the newer commit.
### Why are the changes needed?
Previously these races caused the notify run to `throw`, leaving the PR with no
`Build` check at all. Because the scheduled `update_build_status.yml` only syncs
existing `Build` checks, a PR that hit this race had no status reported and no
way for the updater to recover until the next push. Creating the check (falling
back to the Actions URL when needed) lets the updater take over.
### Does this PR introduce _any_ user-facing change?
No. CI infrastructure only.
### How was this patch tested?
Static verification: the embedded `actions/github-script` body passes
`node --check`, and the workflow YAML parses. The behavior paths (matching run
found, runs present but unmatched SHA, no runs, check-run present, check-run
absent) were traced against the existing control flow.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Claude Opus 4.8)
Co-authored-by: Claude Code
dongjoon-hyun
approved these changes
May 29, 2026
Member
dongjoon-hyun
left a comment
There was a problem hiding this comment.
+1, LGTM. Thank you, @viirya .
If possible, could you make this contribution to the sub-projects, too, please?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR makes
notify_test_workflow.ymlresilient to two timing races betweenthe upstream
pull_request_targetnotify run and the fork's CI:When listing the fork's workflow runs, instead of blindly taking the most
recent run (
workflow_runs[0]) and throwing if itshead_shadoes not matchthe PR head SHA, the script now retries (up to 3 times, 3s apart) looking for
the run whose
head_shamatches the PR head SHA. The listing endpoint ordersby most recent, so the run for the just-pushed SHA may not be registered yet
and a stale run from a previous push could be returned.
When resolving the
Run / Check changescheck-run id (used only to render aCheck-run view link instead of the Actions view, see SPARK-37879), a missing
check-run no longer throws. The check-run materializes later than the workflow
run, especially when the matrix is queued, so this is now best-effort: if it
cannot be found, the
Buildcheck is still created pointing at the Actionsrun URL.
Behavior is otherwise preserved: when no runs exist at all, the
action_required("workflow run detection failed") check is still created; when runs exist but
none match the PR head SHA, the script still throws so a fresh notify run handles
the newer commit.
Why are the changes needed?
Previously these races caused the notify run to
throw, leaving the PR with noBuildcheck at all. Because the scheduledupdate_build_status.ymlonly syncsexisting
Buildchecks, a PR that hit this race had no status reported and noway for the updater to recover until the next push. Creating the check (falling
back to the Actions URL when needed) lets the updater take over.
Does this PR introduce any user-facing change?
No. CI infrastructure only.
How was this patch tested?
Static verification: the embedded
actions/github-scriptbody passesnode --check, and the workflow YAML parses. The behavior paths (matching runfound, runs present but unmatched SHA, no runs, check-run present, check-run
absent) were traced against the existing control flow.
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Claude Opus 4.8)