fix(sources):skip redundant file fingerprinting for already-watched files by vparfonov · Pull Request #275 · ViaQ/vector

vparfonov · 2026-06-09T13:37:52Z

On each glob cycle, FileServer fingerprinted every file returned by the paths provider, even files already being actively watched. Each fingerprint involves syscalls (open, seek, read etc). On clusters with 500+ pods this caused thousands of unnecessary read syscalls per minute, saturating disk I/O and disrupting etcd on control plane nodes.
Add a path-based reverse lookup before fingerprinting. If a file path is already tracked in fp_map and hasn't been truncated (file size >=read position), skip fingerprinting entirely. Truncated files still fall through to full fingerprinting to preserve correct behavior.

Measured impact (500 files, 35s trace):

open: 1,503 → 5 (99.7% reduction)
lseek: 3,000 → 0 (100% reduction)
read: 4,500 → 2,500 (44% reduction, remaining are data reads)
total: 12,033 → 2,555 (78.8% reduction)

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Summary by CodeRabbit

Refactor
- Optimized file server's discovery process to reduce processing overhead and improve performance when monitoring file changes.

coderabbitai · 2026-06-09T13:38:03Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Enterprise

Run ID: c2b43135-a52e-453e-bf24-50f7960085cb

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

FileServer's periodic glob/discovery loop now builds a reverse lookup from watched file paths to their stored fingerprints and read positions. During path iteration, it checks whether the current file size is not smaller than the stored read position; if so, it marks the watcher findable and skips expensive re-fingerprinting. Existing fingerprinting and watcher logic remains as fallback.

Changes

FileServer fingerprinting optimization

Layer / File(s)	Summary
Fast-path fingerprinting with lookup cache `lib/file-source/src/file_server.rs`	Reverse lookup map from watched paths to fingerprints/positions is constructed (lines 188–193) and used to conditionally bypass fingerprinting when file size is not smaller than recorded read position (lines 195–215); existing fingerprinting logic retained as fallback.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A fast path hops through the fingerprint field,
No re-scanning if size gains reveal,
The lookup cache keeps watch with care,
While fallback logic waits still there—
Optimization makes the loop more real! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and concisely describes the main optimization: skipping redundant fingerprinting for already-watched files in the FileServer's periodic discovery loop.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

vparfonov · 2026-06-09T13:38:03Z

/hold

jcantrill · 2026-06-09T14:19:45Z

@coderabbitai review

coderabbitai · 2026-06-09T14:19:52Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/file-source/src/file_server.rs`:
- Around line 199-203: The code currently treats metadata errors as if the file
is not dominated, which incorrectly triggers the fast-path and marks the watcher
findable; instead, change the logic around fs::metadata(&path).await so that you
only compute and use dominated when metadata() returns Ok — e.g., match on
fs::metadata(&path).await and set let dominated = metadata.len() <
*file_position only in the Ok branch, and in the Err branch fall through into
the existing fingerprinting path (do not set or use dominated nor mark the
watcher findable). Update the block that references dominated (the fast-path
check) to only run when metadata succeeded.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Enterprise

Run ID: 62dea80f-3c1b-4150-8d06-2353d3e0ad3c

📥 Commits

Reviewing files that changed from the base of the PR and between a329118 and 7d5aa86.

📒 Files selected for processing (1)

lib/file-source/src/file_server.rs

jcantrill

/approve

openshift-ci · 2026-06-09T14:24:03Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jcantrill, vparfonov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [jcantrill]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

…iles On each glob cycle, FileServer fingerprinted every file returned by the paths provider, even files already being actively watched. Each fingerprint involves syscalls (open, seek, read magic bytes, seek, read first line, EOF check). On clusters with 500+ pods this caused thousands of unnecessary read syscalls per minute, saturating disk I/O and disrupting etcd on control plane nodes. Add a path-based reverse lookup before fingerprinting. If a file path is already tracked in fp_map and hasn't been truncated (file size >= read position), skip fingerprinting entirely. Truncated files still fall through to full fingerprinting to preserve correct behavior. Measured impact (500 files, 35s trace): - open: 1,503 → 5 (99.7% reduction) - lseek: 3,000 → 0 (100% reduction) - read: 4,500 → 2,500 (44% reduction, remaining are data reads) - total: 12,033 → 2,555 (78.8% reduction) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Vitalii Parfonov <vparfono@redhat.com>

vparfonov · 2026-06-09T21:54:47Z

/test cluster-logging-operator-e2e

openshift-ci · 2026-06-09T23:56:21Z

@vparfonov: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/cluster-logging-operator-e2e	`cd88077`	link	true	`/test cluster-logging-operator-e2e`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci Bot added the do-not-merge/hold label Jun 9, 2026

openshift-ci Bot requested review from cahartma and syedriko June 9, 2026 13:38

jcantrill added the v0.54 label Jun 9, 2026

coderabbitai Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread lib/file-source/src/file_server.rs Outdated

jcantrill reviewed Jun 9, 2026

View reviewed changes

openshift-ci Bot added the approved label Jun 9, 2026

vparfonov force-pushed the log9436 branch from 7d5aa86 to cd88077 Compare June 9, 2026 15:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sources):skip redundant file fingerprinting for already-watched files#275

fix(sources):skip redundant file fingerprinting for already-watched files#275
vparfonov wants to merge 1 commit into
ViaQ:v0.54.0-rhfrom
vparfonov:log9436

vparfonov commented Jun 9, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

vparfonov commented Jun 9, 2026

Uh oh!

jcantrill commented Jun 9, 2026

Uh oh!

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

jcantrill left a comment

Uh oh!

openshift-ci Bot commented Jun 9, 2026

Uh oh!

vparfonov commented Jun 9, 2026

Uh oh!

openshift-ci Bot commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vparfonov commented Jun 9, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

vparfonov commented Jun 9, 2026

Uh oh!

jcantrill commented Jun 9, 2026

Uh oh!

coderabbitai Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jcantrill left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci Bot commented Jun 9, 2026

Uh oh!

vparfonov commented Jun 9, 2026

Uh oh!

openshift-ci Bot commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vparfonov commented Jun 9, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading