fix(sources):skip redundant file fingerprinting for already-watched files#275
fix(sources):skip redundant file fingerprinting for already-watched files#275vparfonov wants to merge 1 commit into
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughFileServer's periodic glob/discovery loop now builds a reverse lookup from watched file paths to their stored fingerprints and read positions. During path iteration, it checks whether the current file size is not smaller than the stored read position; if so, it marks the watcher findable and skips expensive re-fingerprinting. Existing fingerprinting and watcher logic remains as fallback. ChangesFileServer fingerprinting optimization
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/hold |
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@lib/file-source/src/file_server.rs`:
- Around line 199-203: The code currently treats metadata errors as if the file
is not dominated, which incorrectly triggers the fast-path and marks the watcher
findable; instead, change the logic around fs::metadata(&path).await so that you
only compute and use dominated when metadata() returns Ok — e.g., match on
fs::metadata(&path).await and set let dominated = metadata.len() <
*file_position only in the Ok branch, and in the Err branch fall through into
the existing fingerprinting path (do not set or use dominated nor mark the
watcher findable). Update the block that references dominated (the fast-path
check) to only run when metadata succeeded.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Enterprise
Run ID: 62dea80f-3c1b-4150-8d06-2353d3e0ad3c
📒 Files selected for processing (1)
lib/file-source/src/file_server.rs
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jcantrill, vparfonov The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…iles On each glob cycle, FileServer fingerprinted every file returned by the paths provider, even files already being actively watched. Each fingerprint involves syscalls (open, seek, read magic bytes, seek, read first line, EOF check). On clusters with 500+ pods this caused thousands of unnecessary read syscalls per minute, saturating disk I/O and disrupting etcd on control plane nodes. Add a path-based reverse lookup before fingerprinting. If a file path is already tracked in fp_map and hasn't been truncated (file size >= read position), skip fingerprinting entirely. Truncated files still fall through to full fingerprinting to preserve correct behavior. Measured impact (500 files, 35s trace): - open: 1,503 → 5 (99.7% reduction) - lseek: 3,000 → 0 (100% reduction) - read: 4,500 → 2,500 (44% reduction, remaining are data reads) - total: 12,033 → 2,555 (78.8% reduction) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Vitalii Parfonov <vparfono@redhat.com>
|
/test cluster-logging-operator-e2e |
|
@vparfonov: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
On each glob cycle, FileServer fingerprinted every file returned by the paths provider, even files already being actively watched. Each fingerprint involves syscalls (open, seek, read etc). On clusters with 500+ pods this caused thousands of unnecessary read syscalls per minute, saturating disk I/O and disrupting etcd on control plane nodes.
Add a path-based reverse lookup before fingerprinting. If a file path is already tracked in fp_map and hasn't been truncated (file size >=read position), skip fingerprinting entirely. Truncated files still fall through to full fingerprinting to preserve correct behavior.
Measured impact (500 files, 35s trace):
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
Summary by CodeRabbit