Skip to content

Fix bad usage of EventuallyWithT in TestAdmissionController e2e test#45896

Merged
gh-worker-dd-mergequeue-cf854d[bot] merged 1 commit into
mainfrom
lenaic/fix_flaky_e2e_test_admission_controller
Feb 5, 2026
Merged

Fix bad usage of EventuallyWithT in TestAdmissionController e2e test#45896
gh-worker-dd-mergequeue-cf854d[bot] merged 1 commit into
mainfrom
lenaic/fix_flaky_e2e_test_admission_controller

Conversation

@L3n41c
Copy link
Copy Markdown
Member

@L3n41c L3n41c commented Feb 4, 2026

What does this PR do?

Fix a bad usage of require.EventuallyWithT(…) in the testAdmissionControllerPod e2e test.
require.EventuallyWithT(…) is normally used to retry a task until it succeeds or times out.
But the use of suite.Require() inside the function made the test fail immediately in case of assertion failure.
In order to benefit from the retry mechanism of EventuallyWithT, errors must be collected in the CollectT param instead.

Motivation

Test TestAdmissionControllerWithAutoDetectedLanguage is currently very flaky.

Describe how you validated your changes

Wait for the CI to execute e2e tests.

Additional Notes

@L3n41c L3n41c added this to the 7.77.0 milestone Feb 4, 2026
@L3n41c L3n41c requested a review from a team as a code owner February 4, 2026 11:15
@L3n41c L3n41c requested a review from a team as a code owner February 4, 2026 11:15
@L3n41c L3n41c added changelog/no-changelog No changelog entry needed qa/done QA done before merge and regressions are covered by tests labels Feb 4, 2026
@L3n41c
Copy link
Copy Markdown
Member Author

L3n41c commented Feb 4, 2026

@codex review

@github-actions github-actions Bot added team/container-platform The Container Platform Team short review PR is simple enough to be reviewed quickly labels Feb 4, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e13fe0a90c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1256 to 1260
require.NoError(c, err)
require.Len(c, appPod.Items, 1)

nodeName := appPod.Items[0].Spec.NodeName

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard pod indexing when CollectT assertions fail

Within EventuallyWithTf, require.Len(c, appPod.Items, 1) records a failure on the CollectT but does not stop execution, so when the list is still empty (a normal retry scenario) the subsequent appPod.Items[0] will panic and abort the retry loop. This reintroduces flakiness because transient zero-length results now fail the test immediately rather than retrying; consider using assert.Len with an early return or checking length before indexing.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

@L3n41c L3n41c Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You’re absolutely wrong!

require.Len(c, appPod.Items, 1) aborts the execution of the function and its remaining instructions are not executed. That’s precisely the difference between require.… and assert.…. This used to be broken inside EventuallyWithT with old versions of testify. But this old issue has now been solved by stretchr/testify#1481.
And the functions is properly retried.

@codex: I encourage you to test this:

package main

import (
        "testing"
        "time"

        "github.com/stretchr/testify/assert"
        "github.com/stretchr/testify/require"
)

func TestEventually(t *testing.T) {
        require.EventuallyWithT(t, func(c *assert.CollectT) {
                t.Log("Try before")
                require.Len(c, []string{"foo"}, 0)
                t.Log("Try after")
        }, 500 * time.Millisecond, 100 * time.Millisecond)
}
--- FAIL: TestEventually (0.50s)
    main_test.go:13: Try before
    main_test.go:13: Try before
    main_test.go:13: Try before
    main_test.go:13: Try before
    main_test.go:13: Try before
    main_test.go:12: 
                Error Trace:    /tmp/test_eventually/main_test.go:14
                                                        /home/lenaic/.local/go/src/runtime/asm_amd64.s:1693
                Error:          "[foo]" should have 0 item(s), but has 1
    main_test.go:12: 
                Error Trace:    /tmp/test_eventually/main_test.go:12
                Error:          Condition never satisfied
                Test:           TestEventually
FAIL
FAIL    test_eventually 0.504s
FAIL

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Testing

  • ⚠️ Not run (not requested).

View task →

Comment on lines 1278 to +1281
deployment, err := suite.Env().KubernetesCluster.Client().AppsV1().Deployments(namespace).Get(ctx, name, metav1.GetOptions{})
if !assert.NoError(c, err) {
c.Errorf("deployment with name %s in namespace %s not found", name, namespace)
return
}
require.NoErrorf(c, err, "deployment with name %s in namespace %s not found", name, namespace)

if deployment.Status.AvailableReplicas == 0 {
c.Errorf("deployment with name %s in namespace %s has 0 available replicas", name, namespace)
}
assert.NotZerof(c, deployment.Status.AvailableReplicas, "deployment with name %s in namespace %s has 0 available replicas", name, namespace)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid nil deref after CollectT NoErrorf

In this EventuallyWithTf, a failed Get (e.g., during rollout) will still leave deployment nil, but require.NoErrorf(c, err, …) only records a failure on the CollectT and does not short-circuit, so deployment.Status can panic and stop retries. This turns a transient “not found” into an immediate test failure; use an early return after an assert.NoError-style check or guard deployment before dereferencing.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

@L3n41c L3n41c Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

require.NoErrorf(c, err, …) only records a failure on the CollectT and does not short-circuit, so deployment.Status can panic and stop retries.

You’re absolutely wrong!

require.… now properly short-circuits and skip the execution of the instructions after.
See my previous comment.
The behavior you describe was the one before stretchr/testify#1481 was merged.

@github-actions github-actions Bot added medium review PR review might take time and removed short review PR is simple enough to be reviewed quickly labels Feb 4, 2026
@agent-platform-auto-pr
Copy link
Copy Markdown
Contributor

Static quality checks

✅ Please find below the results from static quality gates
Comparison made with ancestor 2343a01
📊 Static Quality Gates Dashboard

Successful checks

Info

Quality gate Change Size (prev → curr → max)
docker_agent_amd64 +9.97 KiB (0.00% increase) 809.460 → 809.470 → 817.140
docker_agent_arm64 +11.47 KiB (0.00% increase) 813.239 → 813.250 → 824.020
docker_agent_jmx_amd64 +9.98 KiB (0.00% increase) 1000.371 → 1000.381 → 1008.020
docker_agent_jmx_arm64 +109.78 KiB (0.01% increase) 992.837 → 992.944 → 1003.620
docker_cluster_agent_amd64 +12.57 KiB (0.01% increase) 180.867 → 180.879 → 181.200
docker_cluster_agent_arm64 +12.59 KiB (0.01% increase) 196.732 → 196.744 → 198.490
25 successful checks with minimal change (< 2 KiB)
Quality gate Current Size
agent_deb_amd64 746.988 MiB
agent_deb_amd64_fips 707.009 MiB
agent_heroku_amd64 324.136 MiB
agent_msi 658.956 MiB
agent_rpm_amd64 746.972 MiB
agent_rpm_amd64_fips 706.993 MiB
agent_rpm_arm64 726.154 MiB
agent_rpm_arm64_fips 688.667 MiB
agent_suse_amd64 746.972 MiB
agent_suse_amd64_fips 706.993 MiB
agent_suse_arm64 726.154 MiB
agent_suse_arm64_fips 688.667 MiB
docker_cws_instrumentation_amd64 7.135 MiB
docker_cws_instrumentation_arm64 6.689 MiB
docker_dogstatsd_amd64 38.449 MiB
docker_dogstatsd_arm64 36.812 MiB
dogstatsd_deb_amd64 29.669 MiB
dogstatsd_deb_arm64 27.833 MiB
dogstatsd_rpm_amd64 29.669 MiB
dogstatsd_suse_amd64 29.669 MiB
iot_agent_deb_amd64 42.766 MiB
iot_agent_deb_arm64 39.884 MiB
iot_agent_deb_armhf 40.454 MiB
iot_agent_rpm_amd64 42.767 MiB
iot_agent_suse_amd64 42.767 MiB
On-wire sizes (compressed)
Quality gate Change Size (prev → curr → max)
agent_deb_amd64 -31.18 KiB (0.02% reduction) 182.675 → 182.645 → 184.810
agent_deb_amd64_fips neutral 174.147 MiB → 177.560
agent_heroku_amd64 neutral 87.164 MiB → 88.450
agent_msi -28.0 KiB (0.02% reduction) 142.484 → 142.457 → 143.300
agent_rpm_amd64 +44.82 KiB (0.02% increase) 185.695 → 185.738 → 188.160
agent_rpm_amd64_fips -22.14 KiB (0.01% reduction) 176.483 → 176.462 → 178.900
agent_rpm_arm64 +21.59 KiB (0.01% increase) 168.248 → 168.269 → 169.930
agent_rpm_arm64_fips +50.41 KiB (0.03% increase) 160.554 → 160.603 → 163.120
agent_suse_amd64 +44.82 KiB (0.02% increase) 185.695 → 185.738 → 188.160
agent_suse_amd64_fips -22.14 KiB (0.01% reduction) 176.483 → 176.462 → 178.900
agent_suse_arm64 +21.59 KiB (0.01% increase) 168.248 → 168.269 → 169.930
agent_suse_arm64_fips +50.41 KiB (0.03% increase) 160.554 → 160.603 → 163.120
docker_agent_amd64 -24.94 KiB (0.01% reduction) 274.765 → 274.741 → 277.400
docker_agent_arm64 neutral 262.304 MiB → 266.040
docker_agent_jmx_amd64 -20.78 KiB (0.01% reduction) 343.413 → 343.392 → 346.020
docker_agent_jmx_arm64 +12.65 KiB (0.00% increase) 326.931 → 326.943 → 330.660
docker_cluster_agent_amd64 -11.06 KiB (0.02% reduction) 63.887 → 63.876 → 64.510
docker_cluster_agent_arm64 +2.92 KiB (0.00% increase) 60.151 → 60.154 → 61.170
docker_cws_instrumentation_amd64 neutral 2.994 MiB → 3.330
docker_cws_instrumentation_arm64 neutral 2.726 MiB → 3.090
docker_dogstatsd_amd64 neutral 14.877 MiB → 15.820
docker_dogstatsd_arm64 neutral 14.219 MiB → 14.830
dogstatsd_deb_amd64 neutral 7.839 MiB → 8.790
dogstatsd_deb_arm64 neutral 6.726 MiB → 7.710
dogstatsd_rpm_amd64 neutral 7.851 MiB → 8.800
dogstatsd_suse_amd64 neutral 7.851 MiB → 8.800
iot_agent_deb_amd64 +2.16 KiB (0.02% increase) 11.215 → 11.217 → 12.040
iot_agent_deb_arm64 neutral 9.590 MiB → 10.450
iot_agent_deb_armhf neutral 9.785 MiB → 10.620
iot_agent_rpm_amd64 -2.53 KiB (0.02% reduction) 11.234 → 11.232 → 12.060
iot_agent_suse_amd64 -2.53 KiB (0.02% reduction) 11.234 → 11.232 → 12.060

Copy link
Copy Markdown
Contributor

@triviajon triviajon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@triviajon
Copy link
Copy Markdown
Contributor

Could make sense to also remove this test from flakes.yaml if you're confident that this will remove the flakiness from this test, but leaving it and just collecting some data on it also seems good

@L3n41c
Copy link
Copy Markdown
Member Author

L3n41c commented Feb 5, 2026

/merge

@gh-worker-devflow-routing-ef8351
Copy link
Copy Markdown

gh-worker-devflow-routing-ef8351 Bot commented Feb 5, 2026

View all feedbacks in Devflow UI.

2026-02-05 09:01:52 UTC ℹ️ Start processing command /merge


2026-02-05 09:01:56 UTC ℹ️ MergeQueue: pull request added to the queue

The expected merge time in main is approximately 1h (p90).


2026-02-05 09:20:55 UTC ℹ️ MergeQueue: This merge request was merged

@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot merged commit cf610d8 into main Feb 5, 2026
329 checks passed
@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot deleted the lenaic/fix_flaky_e2e_test_admission_controller branch February 5, 2026 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/no-changelog No changelog entry needed dev/testing medium review PR review might take time qa/done QA done before merge and regressions are covered by tests team/container-platform The Container Platform Team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants