polygonize: fix dask 8-connectivity float divergence at large rtol (#2677)#3041
Merged
Conversation
#2677) Replace the per-pair 8-conn diagonal close test, which honoured every diagonal unconditionally and over-merged regions numpy keeps separate, with a pass that replays numpy's exact CCL predicate over the union of boundary cells. The conditional SW/SE diagonal uses the real orthogonal neighbour value when it is a boundary cell and a recorded in-chunk match flag otherwise, so a chunked 8-conn float raster reproduces the unchunked polygon partition. Add deterministic parity tests.
brendancol
commented
Jun 8, 2026
Contributor
Author
brendancol
left a comment
There was a problem hiding this comment.
PR Review: polygonize 8-conn dask divergence fix (#3041)
Blockers (must fix before merge)
None.
Suggestions (should fix, not blocking)
_boundary_ccl_unions(polygonize.py:417-425):field[coord]keeps the last value written for a coordinate, andw_flag/s_flagcome from whichever owner the iteration hits first. Both are safe because a coordinate maps to a single physical pixel, so every owner sharing it carries the same value and the same numpy W/S match flags. That invariant is load-bearing but only implied by the code. A one-line comment stating it would help the next reader.
Nits (optional improvements)
TestEightConnFuzzParityruns 40 random rasters per rtol across all chunkings, deterministic via the seed, and prints the array and chunks on failure (good). Consider also asserting the numpy reference polygon count is stable so a numpy-side change is caught separately.
What looks good
- The diagnosis is correct: numpy's 8-conn diagonal is conditional on the orthogonal neighbour, and the old pairwise test honoured it unconditionally. Replaying the predicate over the boundary-cell union is the right fix rather than patching the pairwise heuristic.
- The dual-source orthogonal-match decision (real neighbour value when it is a boundary cell, recorded in-chunk flag otherwise) covers both the cross-chunk and interior-guard cases. I checked the nx==1 padding path and the W/S indexing holds.
- 4-conn and integer paths are untouched; the pairwise test bails early for 8-conn float, so there is no double-decision.
- Tests pin the issue repro across every chunking plus a deterministic fuzz, and confirm the 4-conn and integer 8-conn paths are unaffected.
Checklist
- Algorithm matches numpy CCL semantics (verified against _calculate_regions)
- dask+numpy and dask+cupy merge paths consistent with numpy
- NaN handling unchanged (float mask path untouched)
- Edge cases covered (every chunking incl. 1x1, single row/col)
- Dask chunk boundaries handled correctly
- No premature materialization or copies in the new code
- Benchmark already exists (benchmarks/benchmarks/polygonize.py)
- No README change needed (bugfix, no API change)
- Docstrings present and accurate
brendancol
commented
Jun 8, 2026
Contributor
Author
brendancol
left a comment
There was a problem hiding this comment.
Follow-up review (after 780bdc6)
Both findings from the first pass are addressed:
- Suggestion (single-pixel invariant): added a comment in
_boundary_ccl_unionsstating that a coordinate is one physical pixel, so overwritingfieldand reading the match flags from any one owner is safe. - Nit (numpy reference stability):
TestEightConnFuzzParitynow accumulates the numpy polygon count over the fixed-seed sweep and asserts it against pinned per-rtol totals, so a numpy-side CCL change is caught directly instead of silently moving the dask parity reference.
No logic changed; the follow-up is a comment plus a test assertion. Full file test_polygonize_issue_2677.py passes (41 passed) and flake8 is clean. No new findings.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #2677.
Problem
With
connectivity=8and a largertol,polygonizeon a dask-backed float raster gave a different polygon partition (count and DN values) than the same raster as a single numpy chunk. It reproduced even on chunk shapes that do not split the diverging pixels, which points at the diagonal-pairing decision in the cross-chunk merge rather than the chunk-boundary close test.numpy's connected-component labelling treats the 8-connected diagonal as conditional: a pixel checks its SW diagonal only when its W neighbour did not match, and its SE diagonal only when its S neighbour did not match. The dask cross-chunk merge honoured every diagonal pair unconditionally, so at a large
rtolit unioned two regions numpy keeps separate whenever the intervening orthogonal neighbour already matched.Change
Backends
This touches the dask+numpy and dask+cupy merge paths; cupy float routes through the same numpy CCL and cross-chunk merge. The numpy and cupy single-chunk paths are unchanged.
Test plan
test_polygonize_issue_2677.py: the issue repro matches the unchunked numpy partition for every chunking; a deterministic fuzz over random float rasters acrossrtoland chunkings checks 8-conn parity; 4-conn and integer 8-conn paths stay put.