Update ocdebug to ssh by kasturinarra · Pull Request #31216 · openshift/origin

kasturinarra · 2026-05-26T17:34:14Z

Summary by CodeRabbit

Tests
- Kernel panic recovery test updated to perform post-crash verification via hypervisor access, skip when access is unavailable, and traverse intermediate hosts to reach target nodes. It now reads cluster state and configuration remotely and includes stronger checks that the datastore and recovery manager have restarted and recorded the expected recovery events.

openshift-merge-bot · 2026-05-26T17:34:17Z

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

coderabbitai · 2026-05-26T17:34:38Z

Walkthrough

The kernel-panic recovery test now uses hypervisor-mediated two-hop SSH for post-crash verification. It builds SSH known_hosts, queries survived-node pacemaker/etcd metadata via SSH, and performs SSH-based Eventually assertions on the target node's containers and pacemaker logs.

Changes

SSH-based post-crash verification

Layer / File(s)	Summary
Hypervisor SSH plumbing `test/extended/edge_topologies/tnf_recovery.go`	Parses and validates hypervisor SSH config (private key readable), builds local known_hosts, determines survived/target internal IPs, and prepares per-node remote known_hosts for two-hop SSH.
Survived-node retrieval and target-node verification `test/extended/edge_topologies/tnf_recovery.go`	Survived-node queries (pacemaker journal `bump-amount`, `force-new-cluster-bump-amount` from `/var/lib/etcd/config.yaml`, `.maxRaftIndex` from `/var/lib/etcd/revision.json`) are executed via hypervisor-mediated SSH; target-node checks (etcd podman running, `etcd-previous` container exists, pacemaker logs show `pod.yaml` recreation) are done via SSH inside `Eventually` assertions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Caution

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

Ignore

❌ Failed checks (2 errors, 1 warning)

Check name	Status	Explanation	Resolution
Container-Privileges	❌ Error	PR introduces manifests with privileged: true and hostNetwork: true without proper security justification in dockergc-ds.yaml and 3 deployment manifests.	Add explanatory comments justifying privileged configs in examples/dockergc and pkg/monitortests/kubeapiserver/disruptioninclusterapiserver/manifests/, or replace with non-privileged alternatives.
No-Sensitive-Data-In-Logs	❌ Error	The PR logs sensitive data: private key file paths at ssh.go:146 (included in full SSH command string) and tnf_recovery.go:1021, plus internal infrastructure paths (known_hosts files) and node IPs.	Redact sensitive arguments from logged SSH commands by logging only the command being executed, not the full ssh argument list including -i flag with private key path.
Test Structure And Quality	⚠️ Warning	Test lacks proper timeouts on SSH operations (lines 496-591), missing assertion messages (lines 512, 528), and no cleanup of temporary known_hosts files.	Add CommandContext/SSH timeouts to ExecuteRemoteSSHCommand calls; add assertion messages to lines 512 and 528; defer cleanup of localKH, survivedRemoteKH, targetRemoteKH files.

✅ Passed checks (12 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title refers to switching from ocdebug (k8s API/debug execution) to SSH, which matches the core change in the test file where post-crash verification was moved from `exutil.DebugNodeRetryWithOptionsAndChroot` to direct hypervisor-mediated SSH.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	All 11 Ginkgo test declarations use stable, static names with no dynamic content. Test bodies properly use fmt.Sprintf for dynamic values.
Microshift Test Compatibility	✅ Passed	The modified test and all other tests in the file are already protected by [apigroup:config.openshift.io] tag in the Describe block, which automatically skips them on MicroShift.
Single Node Openshift (Sno) Test Compatibility	✅ Passed	BeforeEach() enforces `SkipIfNotTopology(oc, v1.DualReplicaTopologyMode)`, skipping all tests on SNO. The new kernel panic recovery test is within this protected describe block.
Topology-Aware Scheduling Compatibility	✅ Passed	PR modifies only test files (test/extended/edge_topologies/). The check applies to deployment manifests, operator code, or controllers—not test code. No scheduling constraints introduced.
Ote Binary Stdout Contract	✅ Passed	All Printf/Logf calls use g.GinkgoT() framework (intercepted by test runner), no klog usage, no module-level stdout writes, helper functions only called from test blocks.
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	Custom check applies to NEW tests; this PR modifies an existing test. No IPv4 assumptions or external connectivity in modified test code itself.
No-Weak-Crypto	✅ Passed	No weak cryptography (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB), custom crypto implementations, or insecure secret comparisons found in the PR changes.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2026-05-26T17:35:00Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: kasturinarra
Once this PR has been reviewed and has the lgtm label, please assign eggfoobar for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

test/extended/edge_topologies/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/extended/edge_topologies/tnf_recovery.go`:
- Around line 427-438: The test currently only checks that hypervisor config
text exists; add an explicit fast-fail validation after
exutil.GetHypervisorConfig() that verifies required SSH fields
(sshCfg.HypervisorIP, sshCfg.SSHUser, sshCfg.PrivateKeyPath) are non-empty and
that the PrivateKeyPath is readable (e.g., os.Stat or attempt to open) before
constructing core.SSHConfig and calling core.PrepareLocalKnownHostsFile; on
validation failure call g.Skip or o.Expect/fatal with a clear message so the
test fails fast instead of later inside Eventually loops.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 081ff095-daf5-4eea-9d2f-b98cd1acf06f

📥 Commits

Reviewing files that changed from the base of the PR and between a84c511 and 65f9f4a.

📒 Files selected for processing (1)

test/extended/edge_topologies/tnf_recovery.go

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

test/extended/edge_topologies/tnf_recovery.go (1)

573-585: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Missing sudo for podman command.

Consistent with the previous comment about line 560, this podman ps command likely needs sudo when running over SSH as the core user to access system containers.

🐛 Proposed fix to add sudo

 		prevOutput, _, err := core.ExecuteRemoteSSHCommand(targetNodeIP,
-			"podman ps -a --format '{{.Names}}' | grep -m1 etcd-previous",
+			"sudo podman ps -a --format '{{.Names}}' | grep -m1 etcd-previous",
 			&hypervisorConfig, localKH, targetRemoteKH)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/extended/edge_topologies/tnf_recovery.go` around lines 573 - 585, The
SSH command invoking podman in the Eventually block uses "podman ps -a ..."
without sudo; update the command string passed to core.ExecuteRemoteSSHCommand
(the call in the anonymous func used by o.Eventually) to prefix podman with sudo
(e.g., "sudo podman ps -a --format ... | grep -m1 etcd-previous") so the core
user can list system containers; keep the rest of the call and error handling
(prevOutput, targetNodeIP, hypervisorConfig, localKH, targetRemoteKH, and the
expectation message referencing targetNode.Name) unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/extended/edge_topologies/tnf_recovery.go`:
- Around line 517-519: The SSH reads of /var/lib/etcd in the tnf_recovery test
are missing sudo and can fail with permission denied; update the commands passed
to core.ExecuteRemoteSSHCommand (the grep that sets bumpAmountStr and the jq
that reads revision.json) to prefix them with sudo (and keep the existing sudo
on journalctl) so the reads run as root when executed on RHCOS; ensure you
update both command strings that reference /var/lib/etcd/config.yaml and
/var/lib/etcd/revision.json where survivedNodeIP is used.

---

Duplicate comments:
In `@test/extended/edge_topologies/tnf_recovery.go`:
- Around line 573-585: The SSH command invoking podman in the Eventually block
uses "podman ps -a ..." without sudo; update the command string passed to
core.ExecuteRemoteSSHCommand (the call in the anonymous func used by
o.Eventually) to prefix podman with sudo (e.g., "sudo podman ps -a --format ...
| grep -m1 etcd-previous") so the core user can list system containers; keep the
rest of the call and error handling (prevOutput, targetNodeIP, hypervisorConfig,
localKH, targetRemoteKH, and the expectation message referencing
targetNode.Name) unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: ba461195-f6ea-478f-8ea5-06193e78151a

📥 Commits

Reviewing files that changed from the base of the PR and between 65f9f4a and af71706.

📒 Files selected for processing (1)

test/extended/edge_topologies/tnf_recovery.go

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/extended/edge_topologies/tnf_recovery.go`:
- Around line 573-585: The grep used in the Eventually block can match
substrings causing flaky matches; update the command passed to
core.ExecuteRemoteSSHCommand to match the container name exactly (for example
use grep -x '^etcd-previous$' or an exact-name filter) so prevOutput will only
be "etcd-previous"; update the check around prevOutput and the error message in
that block (referencing core.ExecuteRemoteSSHCommand, targetNodeIP, prevOutput,
targetNode.Name) to rely on the exact-match output.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 310c0f00-29a6-4623-a28a-65ac1f84b85f

📥 Commits

Reviewing files that changed from the base of the PR and between af71706 and 9ce4f00.

📒 Files selected for processing (1)

test/extended/edge_topologies/tnf_recovery.go

openshift-merge-bot · 2026-05-26T18:30:47Z

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-metal-ovn-two-node-arbiter
/test e2e-metal-ovn-two-node-fencing
/test e2e-metal-ovn-two-node-fencing-recovery

kasturinarra · 2026-05-27T07:41:40Z

/test e2e-metal-ovn-two-node-fencing-recovery e2e-aws-ovn-fips

kasturinarra · 2026-05-27T08:02:57Z

/payload-job periodic-ci-openshift-release-main-nightly-5.0-e2e-metal-ovn-two-node-fencing-ipv6-recovery

openshift-ci · 2026-05-27T08:03:12Z

@kasturinarra: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-main-nightly-5.0-e2e-metal-ovn-two-node-fencing-ipv6-recovery

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/79bf7300-59a2-11f1-92ea-10dc971041d4-0

kasturinarra · 2026-05-27T09:56:23Z

/payload-job periodic-ci-openshift-release-main-nightly-5.0-e2e-metal-ovn-two-node-fencing-recovery

openshift-ci · 2026-05-27T09:56:27Z

@kasturinarra: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-main-nightly-5.0-e2e-metal-ovn-two-node-fencing-recovery

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/52711730-59b2-11f1-84dc-0186a3c34230-0

kasturinarra · 2026-05-27T11:21:22Z

/test e2e-metal-ovn-two-node-fencing-recovery

kasturinarra · 2026-05-27T16:45:07Z

/payload-job periodic-ci-openshift-release-main-nightly-5.0-e2e-metal-ovn-two-node-fencing-ipv6-recovery

kasturinarra · 2026-05-27T16:45:21Z

/payload-job periodic-ci-openshift-release-main-nightly-5.0-e2e-metal-ovn-two-node-fencing-recovery

openshift-ci · 2026-05-27T16:45:30Z

@kasturinarra: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-main-nightly-5.0-e2e-metal-ovn-two-node-fencing-ipv6-recovery

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/6bbea980-59eb-11f1-9890-9e3acead88fd-0

openshift-ci · 2026-05-27T16:45:44Z

@kasturinarra: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-main-nightly-5.0-e2e-metal-ovn-two-node-fencing-recovery

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/744ff5e0-59eb-11f1-86c5-e96352f08f90-0

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

test/extended/edge_topologies/tnf_recovery.go (1)

427-429: ⚡ Quick win

Mark this spec as hypervisor-gated.

This block adds the same hard prerequisite as the other cold-boot specs here, but the g.It(...) title still lacks [Requires:HypervisorSSHConfig]. Add the marker so filtering and skip reporting stay accurate.

♻️ Suggested tweak

-g.It("should compute etcd revision bump and preserve backup container after kernel panic recovery", func() {
+g.It("should compute etcd revision bump and preserve backup container after kernel panic recovery [Requires:HypervisorSSHConfig]", func() {

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/extended/edge_topologies/tnf_recovery.go` around lines 427 - 429, The
test adds a hypervisor SSH prerequisite using exutil.HasHypervisorConfig() and
g.Skip, but the g.It(...) test title is missing the gating marker; update the
g.It test description to include "[Requires:HypervisorSSHConfig]" (so the test
title matches the runtime skip) — locate the g.It(...) block associated with
this exutil.HasHypervisorConfig()/g.Skip check and append the marker to its
description string.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/extended/edge_topologies/tnf_recovery.go`:
- Around line 496-499: The SSH helpers (ExecuteRemoteSSHCommand and
ExecuteSSHCommand) currently spawn exec.Command("ssh", ...) with no per-attempt
deadline, which can hang the test; update these helpers to time-bound every
two-hop invocation by using exec.CommandContext with a context created via
context.WithTimeout for each call and/or append safe SSH options such as "-o
BatchMode=yes" and "-o ConnectTimeout=<seconds>" (and optionally "-o
ServerAliveInterval=<sec>") to the constructed ssh arguments so nested SSH calls
also time out; ensure every call site that builds a nested ssh command
(including ExecuteRemoteSSHCommand) uses the new context-aware invocation so a
hung node returns an error quickly and allows the Eventually retry loop to
proceed.

---

Nitpick comments:
In `@test/extended/edge_topologies/tnf_recovery.go`:
- Around line 427-429: The test adds a hypervisor SSH prerequisite using
exutil.HasHypervisorConfig() and g.Skip, but the g.It(...) test title is missing
the gating marker; update the g.It test description to include
"[Requires:HypervisorSSHConfig]" (so the test title matches the runtime skip) —
locate the g.It(...) block associated with this
exutil.HasHypervisorConfig()/g.Skip check and append the marker to its
description string.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 7d3499b5-7b1a-466a-b7f6-fea0c9f1e262

📥 Commits

Reviewing files that changed from the base of the PR and between 9ce4f00 and 8374235.

📒 Files selected for processing (1)

test/extended/edge_topologies/tnf_recovery.go

coderabbitai · 2026-05-28T09:44:46Z

+			journalOutput, _, err := core.ExecuteRemoteSSHCommand(survivedNodeIP,
+				fmt.Sprintf("sudo journalctl -u pacemaker --since '%s' | grep 'bump-amount' | tail -1", crashTimestamp),
+				&hypervisorConfig, localKH, survivedRemoteKH)
 			if err != nil {


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== SSH helper timeout-related code ==" rg -n -C4 'func ExecuteSSHCommand|func ExecuteRemoteSSHCommand|ConnectTimeout|BatchMode|ServerAliveInterval|ServerAliveCountMax|CommandContext|WithTimeout|context\.' test/extended/edge_topologies/utils/core/ssh.go echo echo "== SSH-based recovery call sites ==" sed -n '493,594p' test/extended/edge_topologies/tnf_recovery.go | awk '{print NR+492 ":" $0}'

Repository: openshift/origin

Length of output: 6907

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== ssh.go: ExecuteSSHCommand (key options) ==" sed -n '120,240p' test/extended/edge_topologies/utils/core/ssh.go echo echo "== ssh.go: ExecuteRemoteSSHCommand (nestedSSHCommand) ==" sed -n '180,260p' test/extended/edge_topologies/utils/core/ssh.go echo echo "== ssh.go: timeout/cancel related usage ==" rg -n 'ConnectTimeout|BatchMode|ServerAliveInterval|ServerAliveCountMax|CommandContext|WithTimeout|context\.' test/extended/edge_topologies/utils/core/ssh.go

Repository: openshift/origin

Length of output: 8508

Bound each two-hop SSH attempt.

test/extended/edge_topologies/utils/core/ssh.go’s ExecuteRemoteSSHCommand just builds a nested ssh and calls ExecuteSSHCommand, which runs exec.Command("ssh", ...) with no ConnectTimeout/BatchMode/keepalive options and no CommandContext/deadline. A hung (half-booted) node can therefore block the Eventually callback past the assertion’s retry window instead of returning and retrying.

Make each SSH invocation time-bounded (e.g., switch to exec.CommandContext with a per-attempt timeout, and/or add ssh options like -o BatchMode=yes and -o ConnectTimeout=<...> in the helper).

Also applies to the core.ExecuteRemoteSSHCommand call sites at: 517-519, 533-535, 558-570, 573-585, 589-591.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/edge_topologies/tnf_recovery.go` around lines 496 - 499, The SSH helpers (ExecuteRemoteSSHCommand and ExecuteSSHCommand) currently spawn exec.Command("ssh", ...) with no per-attempt deadline, which can hang the test; update these helpers to time-bound every two-hop invocation by using exec.CommandContext with a context created via context.WithTimeout for each call and/or append safe SSH options such as "-o BatchMode=yes" and "-o ConnectTimeout=<seconds>" (and optionally "-o ServerAliveInterval=<sec>") to the constructed ssh arguments so nested SSH calls also time out; ensure every call site that builds a nested ssh command (including ExecuteRemoteSSHCommand) uses the new context-aware invocation so a hung node returns an error quickly and allows the Eventually retry loop to proceed.

The SSH calls are already inside Eventually loops with 5-20 minute timeouts and 5-second poll intervals. If an SSH call hangs, the Eventually will time out and fail the
test. Adding CommandContext or ConnectTimeout to the shared SSH helpers would change behavior for all callers across the test suite (cold-boot tests, node replacement tests, etc.) — that's
a much larger blast radius than the kernel panic fix we're making. The finding is valid in principle but it's not causing the current failures and shouldn't be mixed into this PR.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Resolve the CodeRabbit review comment on this PR gh api \ --method PUT \ -H "Accept: application/vnd.github+json" \ "repos/openshift/origin/pulls/31216/reviews" 2>/dev/null || true # Find and resolve the specific review comment (ID from the URL: r3316838694) gh api \ --method PATCH \ -H "Accept: application/vnd.github+json" \ "repos/openshift/origin/pulls/comments/3316838694/replies" 2>/dev/null || true # List review threads to find the right one to resolve gh api \ --method GET \ -H "Accept: application/vnd.github.v3+json" \ "repos/openshift/origin/pulls/31216/comments" \ --jq '.[] | select(.user.login == "coderabbitai[bot]") | {id: .id, body: .body[0:80]}'

Length of output: 1595

🏁 Script executed:

#!/bin/bash # Use GraphQL to find the review thread and resolve it # First, get the pull request node ID and review threads gh api graphql -f query=' { repository(owner: "openshift", name: "origin") { pullRequest(number: 31216) { id reviewThreads(first: 50) { nodes { id isResolved comments(first: 1) { nodes { databaseId author { login } } } } } } } }' --jq '.data.repository.pullRequest.reviewThreads.nodes[] | select(.comments.nodes[0].databaseId == 3316838694) | {threadId: .id, isResolved: .isResolved}'

Length of output: 747

🏁 Script executed:

#!/bin/bash # Resolve the review thread gh api graphql -f query=' mutation { resolveReviewThread(input: {threadId: "PRRT_kwDOAVZyrM6FWyxq"}) { thread { id isResolved } } }'

Length of output: 705

@kasturinarra, understood — the reasoning is solid and I agree this belongs in a separate PR. I wasn't able to auto-resolve the thread due to integration permissions; you can resolve it manually.

Would you like me to open a follow-up GitHub issue to track adding per-attempt ConnectTimeout/CommandContext to the shared SSH helpers (ExecuteSSHCommand / ExecuteRemoteSSHCommand) so it doesn't get lost?

🧠 Learnings used

Learnt from: Chandan9112 Repo: openshift/origin PR: 31182 File: test/extended/node/node_e2e/node.go:0-0 Timestamp: 2026-05-19T12:28:55.867Z Learning: When reviewing Go code that uses OpenShift configv1 (`github.com/openshift/api/config/v1`), note that `ImageSpec.RegistrySources` is declared as a non-pointer value (`RegistrySources RegistrySources`). It is always safely zero-initialized by Go, so it can never be `nil`. Therefore, do not add nil-guards for `imageSpec.RegistrySources` before accessing its fields (e.g., `AllowedRegistries`, `ContainerRuntimeSearchRegistries`).

Learnt from: Chandan9112 Repo: openshift/origin PR: 31182 File: test/extended/node/node_e2e/node.go:0-0 Timestamp: 2026-05-19T12:28:55.867Z Learning: When using the OpenShift `configv1` API (`github.com/openshift/api/config/v1`), treat `ImageSpec.RegistrySources` as a non-nil Go struct value (`RegistrySources`, not `*RegistrySources`). Because it can never be nil (it’s always zero-initialized), don’t add nil-guards before accessing its fields (e.g., `AllowedRegistries`, `ContainerRuntimeSearchRegistries`). You may still need to handle zero-value contents, but a nil check on `RegistrySources` itself is unnecessary.

openshift-merge-bot · 2026-05-28T09:59:34Z

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-metal-ovn-two-node-arbiter
/test e2e-metal-ovn-two-node-fencing
/test e2e-metal-ovn-two-node-fencing-recovery

kasturinarra · 2026-05-28T10:06:50Z

/payload-job periodic-ci-openshift-release-main-nightly-5.0-e2e-metal-ovn-two-node-fencing-ipv6-recovery

openshift-ci · 2026-05-28T10:06:54Z

@kasturinarra: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-main-nightly-5.0-e2e-metal-ovn-two-node-fencing-ipv6-recovery

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/f27451f0-5a7c-11f1-95c8-48a54c1d4628-0

kasturinarra · 2026-05-28T10:07:25Z

/payload-job periodic-ci-openshift-release-main-nightly-5.0-e2e-metal-ovn-two-node-fencing-recovery

openshift-ci · 2026-05-28T10:07:29Z

@kasturinarra: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-main-nightly-5.0-e2e-metal-ovn-two-node-fencing-recovery

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/07adf990-5a7d-11f1-9a85-aa70fad8c496-0

kasturinarra · 2026-05-28T11:54:56Z

/payload-job periodic-ci-openshift-release-main-nightly-5.0-e2e-metal-ovn-two-node-fencing-ipv6-recovery

openshift-ci · 2026-05-28T11:55:08Z

@kasturinarra: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-main-nightly-5.0-e2e-metal-ovn-two-node-fencing-ipv6-recovery

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/0c388cf0-5a8c-11f1-82ae-b08103e38e90-0

kasturinarra · 2026-05-28T11:55:10Z

/payload-job periodic-ci-openshift-release-main-nightly-5.0-e2e-metal-ovn-two-node-fencing-recovery

openshift-ci · 2026-05-28T11:55:16Z

@kasturinarra: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-main-nightly-5.0-e2e-metal-ovn-two-node-fencing-recovery

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/14bf2af0-5a8c-11f1-870f-4c4c78eb719a-0

kasturinarra · 2026-05-28T18:49:57Z

/test pull-ci-openshift-origin-release-4.22-e2e-metal-ovn-two-node-fencing-recovery

kasturinarra · 2026-05-29T04:59:13Z

/test pull-ci-openshift-origin-release-4.22-e2e-metal-ovn-two-node-fencing-recovery

kasturinarra · 2026-05-29T08:20:31Z

/retest

openshift-ci · 2026-05-29T11:45:32Z

@kasturinarra: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-metal-ovn-two-node-fencing-recovery	`8374235`	link	false	`/test e2e-metal-ovn-two-node-fencing-recovery`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

kasturinarra · 2026-05-29T11:53:31Z

/payload-job periodic-ci-openshift-release-main-nightly-5.0-e2e-metal-ovn-two-node-fencing-recovery

openshift-ci · 2026-05-29T11:53:35Z

@kasturinarra: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-release-main-nightly-5.0-e2e-metal-ovn-two-node-fencing-recovery

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/04059ae0-5b55-11f1-9f3d-9e6c7cdbfdb9-0

openshift-ci Bot requested review from fonta-rh and suleymanakbas91 May 26, 2026 17:34

coderabbitai Bot requested changes May 26, 2026

View reviewed changes

Comment thread test/extended/edge_topologies/tnf_recovery.go

kasturinarra force-pushed the fix_failure branch from 65f9f4a to af71706 Compare May 26, 2026 17:55

coderabbitai Bot requested changes May 26, 2026

View reviewed changes

Comment thread test/extended/edge_topologies/tnf_recovery.go

kasturinarra force-pushed the fix_failure branch from af71706 to 9ce4f00 Compare May 26, 2026 18:07

coderabbitai Bot requested changes May 26, 2026

View reviewed changes

Comment thread test/extended/edge_topologies/tnf_recovery.go

coderabbitai Bot approved these changes May 26, 2026

View reviewed changes

openshift-ci Bot added the ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review label May 26, 2026

Update ocdebug to ssh

8374235

kasturinarra force-pushed the fix_failure branch from 9ce4f00 to 8374235 Compare May 28, 2026 09:37

coderabbitai Bot requested changes May 28, 2026

View reviewed changes

Conversation

kasturinarra commented May 26, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

openshift-merge-bot Bot commented May 26, 2026

Uh oh!

coderabbitai Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Pre-merge checks failed

❌ Failed checks (2 errors, 1 warning)

Uh oh!

openshift-ci Bot commented May 26, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

openshift-merge-bot Bot commented May 26, 2026

Uh oh!

kasturinarra commented May 27, 2026

Uh oh!

kasturinarra commented May 27, 2026

Uh oh!

openshift-ci Bot commented May 27, 2026

Uh oh!

kasturinarra commented May 27, 2026

Uh oh!

openshift-ci Bot commented May 27, 2026

Uh oh!

kasturinarra commented May 27, 2026

Uh oh!

kasturinarra commented May 27, 2026

Uh oh!

kasturinarra commented May 27, 2026

Uh oh!

openshift-ci Bot commented May 27, 2026

Uh oh!

openshift-ci Bot commented May 27, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kasturinarra May 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

openshift-merge-bot Bot commented May 28, 2026

Uh oh!

kasturinarra commented May 28, 2026

Uh oh!

openshift-ci Bot commented May 28, 2026

Uh oh!

kasturinarra commented May 28, 2026

Uh oh!

openshift-ci Bot commented May 28, 2026

Uh oh!

kasturinarra commented May 28, 2026

Uh oh!

openshift-ci Bot commented May 28, 2026

Uh oh!

kasturinarra commented May 28, 2026

Uh oh!

kasturinarra commented May 26, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 26, 2026 •

edited

Loading

coderabbitai Bot May 28, 2026 •

edited

Loading