[SPARK-57154][INFRA] Harden `notify_test_workflow` against fork run/check-run race by viirya · Pull Request #56212 · apache/spark

viirya · 2026-05-29T20:23:44Z

What changes were proposed in this pull request?

This PR makes notify_test_workflow.yml resilient to two timing races between
the upstream pull_request_target notify run and the fork's CI:

When listing the fork's workflow runs, instead of blindly taking the most
recent run (workflow_runs[0]) and throwing if its head_sha does not match
the PR head SHA, the script now retries (up to 3 times, 3s apart) looking for
the run whose head_sha matches the PR head SHA. The listing endpoint orders
by most recent, so the run for the just-pushed SHA may not be registered yet
and a stale run from a previous push could be returned.
When resolving the Run / Check changes check-run id (used only to render a
Check-run view link instead of the Actions view, see SPARK-37879), a missing
check-run no longer throws. The check-run materializes later than the workflow
run, especially when the matrix is queued, so this is now best-effort: if it
cannot be found, the Build check is still created pointing at the Actions
run URL.

Behavior is otherwise preserved: when no runs exist at all, the action_required
("workflow run detection failed") check is still created; when runs exist but
none match the PR head SHA, the script still throws so a fresh notify run handles
the newer commit.

Why are the changes needed?

Previously these races caused the notify run to throw, leaving the PR with no
Build check at all. Because the scheduled update_build_status.yml only syncs
existing Build checks, a PR that hit this race had no status reported and no
way for the updater to recover until the next push. Creating the check (falling
back to the Actions URL when needed) lets the updater take over.

Does this PR introduce any user-facing change?

No. CI infrastructure only.

How was this patch tested?

Static verification: the embedded actions/github-script body passes
node --check, and the workflow YAML parses. The behavior paths (matching run
found, runs present but unmatched SHA, no runs, check-run present, check-run
absent) were traced against the existing control flow.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.8)

…heck-run race ### What changes were proposed in this pull request? This PR makes `notify_test_workflow.yml` resilient to two timing races between the upstream `pull_request_target` notify run and the fork's CI: 1. When listing the fork's workflow runs, instead of blindly taking the most recent run (`workflow_runs[0]`) and throwing if its `head_sha` does not match the PR head SHA, the script now retries (up to 3 times, 3s apart) looking for the run whose `head_sha` matches the PR head SHA. The listing endpoint orders by most recent, so the run for the just-pushed SHA may not be registered yet and a stale run from a previous push could be returned. 2. When resolving the `Run / Check changes` check-run id (used only to render a Check-run view link instead of the Actions view, see SPARK-37879), a missing check-run no longer throws. The check-run materializes later than the workflow run, especially when the matrix is queued, so this is now best-effort: if it cannot be found, the `Build` check is still created pointing at the Actions run URL. Behavior is otherwise preserved: when no runs exist at all, the `action_required` ("workflow run detection failed") check is still created; when runs exist but none match the PR head SHA, the script still throws so a fresh notify run handles the newer commit. ### Why are the changes needed? Previously these races caused the notify run to `throw`, leaving the PR with no `Build` check at all. Because the scheduled `update_build_status.yml` only syncs existing `Build` checks, a PR that hit this race had no status reported and no way for the updater to recover until the next push. Creating the check (falling back to the Actions URL when needed) lets the updater take over. ### Does this PR introduce _any_ user-facing change? No. CI infrastructure only. ### How was this patch tested? Static verification: the embedded `actions/github-script` body passes `node --check`, and the workflow YAML parses. The behavior paths (matching run found, runs present but unmatched SHA, no runs, check-run present, check-run absent) were traced against the existing control flow. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (Claude Opus 4.8) Co-authored-by: Claude Code

dongjoon-hyun

+1, LGTM. Thank you, @viirya .

If possible, could you make this contribution to the sub-projects, too, please?

dongjoon-hyun approved these changes May 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-57154][INFRA] Harden `notify_test_workflow` against fork run/check-run race#56212

[SPARK-57154][INFRA] Harden `notify_test_workflow` against fork run/check-run race#56212
viirya wants to merge 1 commit into
apache:masterfrom
viirya:SPARK-57154-notify-race

viirya commented May 29, 2026

Uh oh!

dongjoon-hyun left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

viirya commented May 29, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants