Skip to content

test: make push_down_filter_regression dynamic filter content deterministic (#22621)#22643

Open
diegoQuinas wants to merge 1 commit into
apache:mainfrom
diegoQuinas:fix/flaky-push-down-filter-regression
Open

test: make push_down_filter_regression dynamic filter content deterministic (#22621)#22643
diegoQuinas wants to merge 1 commit into
apache:mainfrom
diegoQuinas:fix/flaky-push-down-filter-regression

Conversation

@diegoQuinas
Copy link
Copy Markdown

Which issue does this PR close?

Rationale for this change

push_down_filter_regression.slt (added in #22150) asserts the exact
DynamicFilter content rendered by EXPLAIN ANALYZE on the agg_dyn_*
fixtures. That content is not deterministic: the filter threshold tightens
as each AggregateExec(mode=Partial) publishes its running min/max, and the
EXPLAIN ANALYZE snapshot can be taken while the filter is still converging.

For agg_dyn_single, file_0 holds the global min (1) and file_1 a larger
partial min (3). If the snapshot lands after file_1 publishes 3 but before
file_0 publishes 1, the filter reads a < 3 instead of the final a < 1
exactly the intermittent CI failure reported in #22621. The fixture's comment
incorrectly claimed the filter content was deterministic and only the pruning
counts raced.

What changes are included in this PR?

Make the filter content independent of publish order by giving every file the
same per-file min/max
, so any snapshot equals the fully converged filter:

  • agg_dyn_single — both files (1), (8) → each file min=1, max=8.
  • agg_dyn_two_col — each file min(a)=1, max(b)=9.
  • agg_dyn_mixed — each file min(a)=1, max(a)=8, max(b)=12.

agg_dyn_two_col and agg_dyn_mixed were not in the reported failure but shared
the same latent race (differing per-file extremes), so they are fixed too.
agg_dyn_nulls is left untouched — its filter is always true and never races.

The expected plan text is unchanged; only the input data and the misleading
comments are modified. The alternative of forcing a single partition was
rejected: dynamic aggregate filters are only emitted in Partial+Final mode
(target_partitions >= 2), so a single partition would emit no filter at all.

Are these changes tested?

Yes — the modified push_down_filter_regression.slt itself is the test. It
passes, and because the asserted filter content no longer depends on partition
scheduling, it is stable across runs (verified by running it repeatedly locally).

Are there any user-facing changes?

No. Test-only change.

…nistic

The agg_dyn_* fixtures asserted the exact DynamicFilter content captured by
EXPLAIN ANALYZE, but that content converges as parallel Partial aggregates
publish their bounds. A snapshot can observe an intermediate value, making the
test flaky (apache#22621).

Give every file the same per-file min/max so any snapshot equals the converged
filter, regardless of publish order. Applied to agg_dyn_single (the reported
case) and to agg_dyn_two_col and agg_dyn_mixed, which shared the same latent
race. Expected plan text is unchanged; only the input data and comments differ.

Closes apache#22621
@github-actions github-actions Bot added the sqllogictest SQL Logic Tests (.slt) label May 30, 2026
@neilconway
Copy link
Copy Markdown
Contributor

cc @xiedeyantu @Dandandan @kosiew Apparently this was introduced by #22150

@xiedeyantu
Copy link
Copy Markdown
Member

cc @xiedeyantu @Dandandan @kosiew Apparently this was introduced by #22150

@neilconway Thank you point it out, I will re-examine the original PR.

@xiedeyantu
Copy link
Copy Markdown
Member

@neilconway @diegoQuinas I reviewed my previous PR #22150 and couldn't find anything causing the problem. The current issue mentions instability due to concurrency; could the optimizations in the PR have caused changes in execution time that exposed this issue? If we can confirm that the filter is unstable, then the root cause might not be in my previous PR (because that test wasn't newly added in this PR).

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you -- this make sense to me @diegoQuinas

# --- single-column fixture ([5, 1, 3, 8]) split across 2 files ---
# --- single-column fixture ([1, 8, 1, 8]) split across 2 files ---
#
# Every file shares the same per-file min (1) and max (8). This makes the
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

# Use `analyze_level = summary` + `analyze_categories = 'none'` so metrics
# render empty; we only care that the `predicate=DynamicFilter [ ... ]` text
# matches. Pruning metrics here are subject to a parallel-execution race
# matches. The pruning *counts* are still subject to a parallel-execution race
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern with this approach is that it is changing the meaning of the test

However, it looks like we have had issues with this test before that @mbutrovich saw

So given that I agree the actual dynamic filter content is subject to a race condition this is best I can imagine us doing at this point

@alamb alamb enabled auto-merge June 1, 2026 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flaky test: push_down_filter_regression.slt dynamic filter content is non-deterministic

4 participants