Skip to content

OCPBUGS-86044: Use generated names in oc adm policy SCC test#31221

Open
tchap wants to merge 2 commits into
openshift:mainfrom
tchap:OCPBUGS-86044/fix-scc-policy-test-race
Open

OCPBUGS-86044: Use generated names in oc adm policy SCC test#31221
tchap wants to merge 2 commits into
openshift:mainfrom
tchap:OCPBUGS-86044/fix-scc-policy-test-race

Conversation

@tchap
Copy link
Copy Markdown
Contributor

@tchap tchap commented May 27, 2026

The SCC section of this test operates on a cluster-scoped
ClusterRoleBinding with hardcoded subject names. Unlike namespaced
resources, cluster-scoped resources are not cleaned up between test
retries. When the first attempt fails partway through the removes,
leftover subjects cause the retry's adds to be no-ops at the etcd
level, preventing resourceVersion from advancing and disabling 409
conflict detection on subsequent writes.

Use unique generated names so each attempt is independent.

Summary by CodeRabbit

  • Tests
    • Enhanced CLI admin policy tests to use dynamically generated subject names; verify adding/removing privileged SCC for users, service accounts, and groups, including polling until bindings reflect changes and tolerating removed bindings during pruning checks.

Note: Test-only changes; no user-facing functionality altered.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels May 27, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@tchap: This pull request references Jira Issue OCPBUGS-86044, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Batch user and service account subjects into single oc invocations to eliminate cross-master watch cache races, and poll with Eventually to wait for cache propagation between adds and removes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 27, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 8513f6ff-be30-4b22-b187-13ebe6871a6c

📥 Commits

Reviewing files that changed from the base of the PR and between 0f8e575 and 21a1c8f.

📒 Files selected for processing (1)
  • test/extended/cli/admin.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/extended/cli/admin.go

Walkthrough

Tests in test/extended/cli/admin.go now generate a user, serviceaccount, and group, add/remove the privileged SCC for those subjects while polling the privileged clusterrolebinding YAML to assert presence/absence, and run oc adm prune auth for the generated user and group.

Changes

SCC Policy CLI Test Updates

Layer / File(s) Summary
Add SCC and verify binding
test/extended/cli/admin.go
Generates fakeUser, fakeSA, and fakeGroup; adds privileged SCC to the user, -z fakeSA serviceaccount, and group; polls clusterrolebinding/system:openshift:scc:privileged -o yaml and asserts it contains those subjects.
Remove SCC and verify binding
test/extended/cli/admin.go
Removes privileged SCC from the generated user (including -z fakeSA) and group; re-fetches the privileged SCC binding YAML and asserts the removed identifiers are absent or accepts “not found” if the binding was pruned.
Prune auth for generated user
test/extended/cli/admin.go
Runs oc adm prune auth users/<generated> using the generated user name and verifies the privileged SCC clusterrolebinding reflects the change.
Prune auth for generated group
test/extended/cli/admin.go
Sets up privileged SCC on the generated group and runs oc adm prune auth group/<generated> using the generated group name for verification.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 13 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Microshift Test Compatibility ⚠️ Warning New "node-logs" test assumes master nodes with "node-role.kubernetes.io/master" label, incompatible with MicroShift's single-node architecture and without protection tags. Add [Skipped:MicroShift] to "node-logs" test name or guard with exutil.IsMicroShiftCluster() check, as MicroShift is single-node without dedicated master nodes.
Topology-Aware Scheduling Compatibility ⚠️ Warning PR introduces deployments with required anti-affinity (hostname key), broad arbiter tolerations, and node-count-derived replicas incompatible with SNO/TNF/TNA. Check ControlPlaneTopology; use preferred instead of required anti-affinity; exclude arbiter nodes; cap replicas to schedulable nodes; test on SNO/TNF/TNA/HyperShift.
✅ Passed checks (13 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and specifically references the main change: using generated names in the oc adm policy SCC test, which matches the primary objective of making test attempts independent through unique identifiers.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed Test title remains static. Generated identifiers (fakeUser, fakeSA, fakeGroup) are only used in test body operations/assertions, not the test name.
Test Structure And Quality ✅ Passed Test uses generated unique names for isolation, has proper timeouts (30s/1s) in Eventually calls for cache propagation, and cleanly removes/prunes resources. Consistent with codebase patterns.
Single Node Openshift (Sno) Test Compatibility ✅ Passed The "policy" test in admin.go does not make multi-node assumptions; it tests cluster-wide RBAC and SCC management which work on SNO. No new SNO-incompatible test patterns detected.
Ote Binary Stdout Contract ✅ Passed The PR modifies only test case code within an It() block. No process-level stdout writes, klog configurations, or OTE contract violations were found in the file.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed The PR modifies the 'policy' test's SCC section to use generated names and polling, with no IPv4 hardcoded addresses, external registry access, or connectivity requirements introduced.
No-Weak-Crypto ✅ Passed PR contains only test-only changes to CLI admin tests with no cryptographic code, weak algorithms, or insecure secret comparisons.
Container-Privileges ✅ Passed PR modifies only test code (admin.go); no container/K8s manifests with privilege flags added. "Privileged" is an SCC name being tested, not a container privilege setting.
No-Sensitive-Data-In-Logs ✅ Passed Test uses dynamically generated fake names (fake-user-, fake-sa-, fake-group-*) only in command arguments and assertions; no logging functions expose sensitive data.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from deads2k and p0lyn0mial May 27, 2026 08:53
@openshift-ci-robot
Copy link
Copy Markdown

@tchap: This pull request references Jira Issue OCPBUGS-86044, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

Batch user and service account subjects into single oc invocations to eliminate cross-master watch cache races, and poll with Eventually to wait for cache propagation between adds and removes.

Summary by CodeRabbit

  • Tests
  • Updated CLI admin policy tests to use asynchronous polling for improved reliability when verifying cluster role binding changes.

Note: This release contains test improvements only. No user-facing functionality has changed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 27, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tchap
Once this PR has been reviewed and has the lgtm label, please assign stbenjam for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/extended/cli/admin.go`:
- Around line 225-233: The polling callback passed to Eventually that runs
oc.Run("get").Args("clusterrolebinding/system:openshift:scc:privileged", "-o",
"yaml").Output() must treat a "not found" error as a terminal success: if the
get command returns an error and the error message indicates the
ClusterRoleBinding is missing (e.g. contains "NotFound" or "not found"), return
an empty string and nil so Eventually stops; otherwise propagate the error as
before. Update the anonymous func used in the Eventually call to detect the
missing-binding error and convert it into a successful (empty, nil) response.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 83551d20-0e8b-46b7-91e3-5f5fe21302d8

📥 Commits

Reviewing files that changed from the base of the PR and between 355916d and 02600b3.

📒 Files selected for processing (1)
  • test/extended/cli/admin.go

Comment thread test/extended/cli/admin.go Outdated
@openshift-ci openshift-ci Bot added the ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review label May 27, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

Comment thread test/extended/cli/admin.go
o.Expect(err).To(o.HaveOccurred())
o.Expect(out).To(o.ContainSubstring("error: rolebinding custom found for role view, not other"))

o.Expect(oc.Run("adm", "policy", "add-scc-to-user").Args("privileged", "fake-user").Execute()).To(o.Succeed())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to keep the tests we have.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it does causes the race, I think, that's why it's being changed. The point of this change is to execute this change in a single oc command so that it's sent together.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are probably right to push back on this, because it's not clear why the test is failing very rarely. There is not enough data in the failing job regarding why this happens. It seems the data is written into etcd even though it should return 409, but I don't manage to find information detailed enough in CI jobs to be able to troubleshoot this 😐 So I ended up just trying to improve the test, which is not particularly clear.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the issue is that the test is not using randomly generated names and there are some leftovers from the previous test run, causing 409 not to happen because the change is actually already there, but I am gonna continue tomorrow. Thanks for pushing back.

Comment thread test/extended/cli/admin.go
The SCC section of this test operates on a cluster-scoped
ClusterRoleBinding with hardcoded subject names. Unlike namespaced
resources, cluster-scoped resources are not cleaned up between test
retries. When the first attempt fails partway through the removes,
leftover subjects cause the retry's adds to be no-ops at the etcd
level, preventing resourceVersion from advancing and disabling 409
conflict detection on subsequent writes.

Use unique generated names so each attempt is independent.
@tchap tchap force-pushed the OCPBUGS-86044/fix-scc-policy-test-race branch from 02600b3 to 0f8e575 Compare May 27, 2026 23:02
@tchap
Copy link
Copy Markdown
Contributor Author

tchap commented May 27, 2026

/retitle OCPBUGS-86044: Use generated names in oc adm policy SCC test

@openshift-ci openshift-ci Bot changed the title OCPBUGS-86044: Fix oc adm policy SCC test race on multi-master clusters OCPBUGS-86044: Use generated names in oc adm policy SCC test May 27, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/extended/cli/admin.go`:
- Around line 219-221: The test issues multiple separate oc invocations to
mutate SCC subjects, leaving cache-propagation race windows; instead, batch
subject mutations into single oc admin policy commands by combining multiple
subject arguments in one call (e.g., replace separate
oc.Run("adm","policy","add-scc-to-user").Args("privileged", fakeUser) and
oc.Run("adm","policy","add-scc-to-user").Args("privileged", "-z", fakeSA) with a
single oc.Run("adm","policy","add-scc-to-user").Args("privileged", fakeUser,
"-z", fakeSA).Execute(), and do the same for
oc.Run("adm","policy","add-scc-to-group").Args(...) instances referenced around
the same section) so all related user/sa/group updates occur in one command and
remove the inter-command race window.
- Around line 222-241: Replace the immediate post-mutation CRB assertions with
retries using Eventually to avoid stale-cache flakes: after the adm policy
remove-scc-from-* calls (the
oc.Run(...).Args("clusterrolebinding/system:openshift:scc:privileged", "-o",
"yaml").Output() call and subsequent checks that reference fakeUser, fakeSA,
fakeGroup), wrap the get+assert logic in an Eventually that polls until either
the CRB YAML no longer contains the removed fakeUser/fakeSA/fakeGroup or the get
returns the "not found" message; do the same replacement for the other similar
block that spans the later checks (the second occurrence noted at lines
244-252).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 4ccfbaa4-7746-403c-9a7b-5eae0e1daac3

📥 Commits

Reviewing files that changed from the base of the PR and between 02600b3 and 0f8e575.

📒 Files selected for processing (1)
  • test/extended/cli/admin.go

Comment thread test/extended/cli/admin.go Outdated
Comment thread test/extended/cli/admin.go Outdated
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

Batch user and service account subjects into single oc invocations
and use Eventually to poll for cache propagation between adds and
removes as additional hardening against watch cache races.
@tchap
Copy link
Copy Markdown
Contributor Author

tchap commented May 28, 2026

@ardaguclu CodeRabbit now asked me to put back what I removed based on your review, so we need to decide 🙂 I added what CR requested as a separate commit. I mean, it makes sense. But let's discuss.

@openshift-ci-robot
Copy link
Copy Markdown

@tchap: This pull request references Jira Issue OCPBUGS-86044, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

The SCC section of this test operates on a cluster-scoped
ClusterRoleBinding with hardcoded subject names. Unlike namespaced
resources, cluster-scoped resources are not cleaned up between test
retries. When the first attempt fails partway through the removes,
leftover subjects cause the retry's adds to be no-ops at the etcd
level, preventing resourceVersion from advancing and disabling 409
conflict detection on subsequent writes.

Use unique generated names so each attempt is independent.

Summary by CodeRabbit

  • Tests
  • Enhanced CLI admin policy tests to use dynamically generated subject names; verify adding/removing privileged SCC for users, service accounts, and groups, including polling until bindings reflect changes and tolerating removed bindings during pruning checks.

Note: Test-only changes; no user-facing functionality altered.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 28, 2026

@tchap: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants