Perf: Use a KDTree on CPU instead of full EDT for 1.5x to 16x faster metric computation by jberg5 · Pull Request #8910 · Project-MONAI/MONAI

jberg5 · 2026-06-10T00:03:28Z

Fixes # 8909.
#8909

Description

Pretty straightforward: we don't need to compute the distance between every conceivable voxel, just the edges, so we can use a KDTree on CPU and compute HausdorffDistanceMetric and SurfaceDistanceMetric significantly faster.

GPU implementations of KDTrees exist, but I did a little benchmarking and found that they are slower than the full EDT since the distance computations are embarrassingly parallel and well-suited to the hardware (gpu goes brrr), so I left that path unchanged.

Measured speedups (on my M3 mac, and Intel Cascade Lake) range from 1.5x for small inputs to 16x for larger volumes and noisier data.

I think existing test coverage is good enough that we don't need more here - all pass for me, and I've spot checked a few problems to ensure identical output metrics.

Types of changes

Non-breaking change (fix or new feature that would not break existing functionality).
Breaking change (fix or new feature that would cause existing functionality to change).
New tests added to cover the changes.
Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
In-line docstrings updated.
Documentation updated, tested make html command in the docs/ folder.

Signed-off-by: J Berg <j.berg2349@gmail.com>

coderabbitai · 2026-06-10T00:03:41Z

📝 Walkthrough

Walkthrough

Optimizes Euclidean surface distance computation in get_surface_distance by switching CPU paths to sparse KDTree queries instead of dense distance transforms. For non-CUDA inputs, the function now builds a KDTree from ground-truth edge voxel coordinates, queries it for prediction edge distances (with optional spacing scaling), and returns early. CUDA inputs remain handled by the existing distance-transform path.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	Title clearly describes the main change: replacing full EDT with KDTree on CPU for performance gains. Specific and measurable (1.5x to 16x speedup).
Description check	✅ Passed	Description follows template, references issue `#8909`, explains rationale, documents speedups, and confirms tests passed. All key checkboxes completed accurately.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

monai/metrics/utils.py (1)
264-286: ⚡ Quick win

Complete the public docstring contract.

get_surface_distance is public, but the docstring still omits Returns and Raises, so callers have to infer the output type/shape and failure modes from the implementation.

As per coding guidelines, "Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@monai/metrics/utils.py` around lines 264 - 286, Update the public docstring
for get_surface_distance to include a Returns section stating it returns a 1-D
numpy.ndarray (dtype float) of distances where each element is the distance from
a surface/edge voxel in seg_pred to its nearest surface voxel in seg_gt (length
equals number of edge voxels found in seg_pred), and a Raises section
documenting the error conditions: raise ValueError for invalid distance_metric
values (not "euclidean", "chessboard", or "taxicab"), raise ValueError if
spacing is a sequence whose length does not match the image dimensionality, and
raise TypeError if seg_pred/seg_gt are not array-like inputs; also note that if
seg_pred or seg_gt contain no foreground/edges the function may produce nan/inf
distances (it does not implicitly convert that into an exception).
Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@monai/metrics/utils.py`:
- Around line 308-315: Add a unit test that exercises the spacing-scaled KDTree
branch in monai/metrics/utils.py: create simple ground-truth and predicted
binary segmentations (seg_gt, seg_pred) with anisotropic spacing (e.g.,
spacing=(1.0, 2.5, 0.5)), call the surface-distance routine that uses KDTree
(the code path that scales gt_coords and pred_coords and constructs KDTree), and
assert the returned distances match the distances computed by manually scaling
coordinates and querying sklearn.neighbors.KDTree (or by comparing against the
same call with spacing applied to inputs), and also check output dtype via
convert_to_dst_type (e.g., lib.float32) and shape; this will lock in the
spacing-scaling behavior covered by KDTree and convert_to_dst_type.

---

Nitpick comments:
In `@monai/metrics/utils.py`:
- Around line 264-286: Update the public docstring for get_surface_distance to
include a Returns section stating it returns a 1-D numpy.ndarray (dtype float)
of distances where each element is the distance from a surface/edge voxel in
seg_pred to its nearest surface voxel in seg_gt (length equals number of edge
voxels found in seg_pred), and a Raises section documenting the error
conditions: raise ValueError for invalid distance_metric values (not
"euclidean", "chessboard", or "taxicab"), raise ValueError if spacing is a
sequence whose length does not match the image dimensionality, and raise
TypeError if seg_pred/seg_gt are not array-like inputs; also note that if
seg_pred or seg_gt contain no foreground/edges the function may produce nan/inf
distances (it does not implicitly convert that into an exception).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5cb2d418-a870-4b9d-b2c3-e8a659328cd1

📥 Commits

Reviewing files that changed from the base of the PR and between eccefc5 and 7dd6f73.

📒 Files selected for processing (1)

monai/metrics/utils.py

Signed-off-by: J Berg <j.berg2349@gmail.com>

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/metrics/test_surface_distance.py`:
- Around line 213-222: Add a Google-style docstring to the test method
test_torch_input_preserves_type_device_and_matches_dense_edt describing the test
purpose: that it verifies get_surface_distance accepts torch.Tensor inputs,
preserves dtype and device, computes Euclidean surface distances using the
provided spacing, and matches the reference dense distance_transform_edt result;
include brief sections for Args (describe seg_pred, seg_gt, spacing), Expected
behavior (dtype/device preserved, values match reference), and any important
notes about tolerance used for comparison.
- Around line 204-211: Add a Google-style docstring to the test method
test_cpu_kdtree_euclidean_distances_match_dense_edt describing the test purpose,
inputs, and expected outcome: state that it compares KD-tree CPU implementation
of surface distances (distance_metric="euclidean") against
scipy.ndimage.distance_transform_edt with given spacing, mention the parameters
_name and spacing, and note the assertions made (value equality within
tolerances, dtype float32, and matching shape). Place the docstring directly
under the def line for test_cpu_kdtree_euclidean_distances_match_dense_edt.
- Around line 190-199: The function _edge_masks lacks a Google-style docstring;
add one describing the seed parameter (int, default 0) and the return value
(tuple of two numpy boolean arrays: edges_pred and edges_gt representing edge
masks for prediction and ground truth). Place the docstring immediately under
the def _edge_masks(...) line, follow Google style with Args and Returns
sections, and mention the deterministic randomness from seed and that the
function returns boolean edge mask arrays created by get_mask_edges.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9669e508-773e-4ba2-88cf-7ac298f35348

📥 Commits

Reviewing files that changed from the base of the PR and between 7dd6f73 and cf640e8.

📒 Files selected for processing (1)

tests/metrics/test_surface_distance.py

coderabbitai · 2026-06-10T00:26:00Z

+def _edge_masks(seed=0):
+    # two offset spheres plus a few scattered false positives in the prediction, so the
+    # surfaces are non-trivially apart and an outlier expands the cropped bounding box.
+    gt = create_spherical_seg_3d(radius=20, centre=(30, 30, 30))
+    pred = create_spherical_seg_3d(radius=20, centre=(32, 31, 30))
+    rng = np.random.RandomState(seed)
+    for _ in range(5):
+        pred[tuple(rng.randint(0, s) for s in pred.shape)] = 1
+    edges_pred, edges_gt = get_mask_edges(pred, gt)
+    return np.asarray(edges_pred, dtype=bool), np.asarray(edges_gt, dtype=bool)


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add Google-style docstring.

Missing docstring violates coding guidelines. Document the seed parameter and the return value (tuple of boolean edge mask arrays).

📝 Example docstring

def _edge_masks(seed=0): + """Generate test edge masks from offset spheres with random false positives. + + Args: + seed: Random seed for reproducible false positive placement. + + Returns: + Tuple of (edges_pred, edges_gt) as boolean numpy arrays. + """ # two offset spheres plus a few scattered false positives in the prediction, so the

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/metrics/test_surface_distance.py` around lines 190 - 199, The function _edge_masks lacks a Google-style docstring; add one describing the seed parameter (int, default 0) and the return value (tuple of two numpy boolean arrays: edges_pred and edges_gt representing edge masks for prediction and ground truth). Place the docstring immediately under the def _edge_masks(...) line, follow Google style with Args and Returns sections, and mention the deterministic randomness from seed and that the function returns boolean edge mask arrays created by get_mask_edges.

Source: Coding guidelines

coderabbitai · 2026-06-10T00:26:00Z

+    def test_cpu_kdtree_euclidean_distances_match_dense_edt(self, _name, spacing):
+        edges_pred, edges_gt = _edge_masks()
+        result = np.asarray(get_surface_distance(edges_pred, edges_gt, distance_metric="euclidean", spacing=spacing))
+        reference = distance_transform_edt(~edges_gt, sampling=spacing)[edges_pred]
+        # same multiset of distances (downstream metrics only use max/percentile/mean)
+        np.testing.assert_allclose(np.sort(result), np.sort(reference), rtol=1e-5, atol=1e-5)
+        self.assertEqual(result.dtype, np.float32)
+        self.assertEqual(result.shape, reference.shape)


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add Google-style docstring.

Test method lacks required docstring. Document the test purpose and parameters.

📝 Example docstring

`@parameterized.expand`(KDTREE_SPACINGS) def test_cpu_kdtree_euclidean_distances_match_dense_edt(self, _name, spacing): + """Verify CPU KDTree euclidean distances match dense EDT reference. + + Args: + _name: Test case name (from parameterized). + spacing: Voxel spacing for distance computation. + """ edges_pred, edges_gt = _edge_masks()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def test_cpu_kdtree_euclidean_distances_match_dense_edt(self, _name, spacing):

edges_pred, edges_gt = _edge_masks()

result = np.asarray(get_surface_distance(edges_pred, edges_gt, distance_metric="euclidean", spacing=spacing))

reference = distance_transform_edt(~edges_gt, sampling=spacing)[edges_pred]

# same multiset of distances (downstream metrics only use max/percentile/mean)

np.testing.assert_allclose(np.sort(result), np.sort(reference), rtol=1e-5, atol=1e-5)

self.assertEqual(result.dtype, np.float32)

self.assertEqual(result.shape, reference.shape)

def test_cpu_kdtree_euclidean_distances_match_dense_edt(self, _name, spacing):

"""Verify CPU KDTree euclidean distances match dense EDT reference.

Args:

_name: Test case name (from parameterized).

spacing: Voxel spacing for distance computation.

"""

edges_pred, edges_gt = _edge_masks()

result = np.asarray(get_surface_distance(edges_pred, edges_gt, distance_metric="euclidean", spacing=spacing))

reference = distance_transform_edt(~edges_gt, sampling=spacing)[edges_pred]

# same multiset of distances (downstream metrics only use max/percentile/mean)

np.testing.assert_allclose(np.sort(result), np.sort(reference), rtol=1e-5, atol=1e-5)

self.assertEqual(result.dtype, np.float32)

self.assertEqual(result.shape, reference.shape)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/metrics/test_surface_distance.py` around lines 204 - 211, Add a Google-style docstring to the test method test_cpu_kdtree_euclidean_distances_match_dense_edt describing the test purpose, inputs, and expected outcome: state that it compares KD-tree CPU implementation of surface distances (distance_metric="euclidean") against scipy.ndimage.distance_transform_edt with given spacing, mention the parameters _name and spacing, and note the assertions made (value equality within tolerances, dtype float32, and matching shape). Place the docstring directly under the def line for test_cpu_kdtree_euclidean_distances_match_dense_edt.

Source: Coding guidelines

coderabbitai · 2026-06-10T00:26:00Z

+    def test_torch_input_preserves_type_device_and_matches_dense_edt(self):
+        edges_pred, edges_gt = _edge_masks()
+        spacing = (1.0, 2.5, 0.5)
+        seg_pred, seg_gt = torch.as_tensor(edges_pred), torch.as_tensor(edges_gt)
+        result = get_surface_distance(seg_pred, seg_gt, distance_metric="euclidean", spacing=spacing)
+        self.assertIsInstance(result, torch.Tensor)
+        self.assertEqual(result.dtype, torch.float32)
+        self.assertEqual(result.device, seg_pred.device)
+        reference = distance_transform_edt(~edges_gt, sampling=spacing)[edges_pred]
+        np.testing.assert_allclose(np.sort(result.cpu().numpy()), np.sort(reference), rtol=1e-5, atol=1e-5)


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add Google-style docstring.

Test method lacks required docstring. Document the test purpose.

📝 Example docstring

def test_torch_input_preserves_type_device_and_matches_dense_edt(self): + """Verify torch inputs preserve type/device/dtype and match EDT reference.""" edges_pred, edges_gt = _edge_masks()

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/metrics/test_surface_distance.py` around lines 213 - 222, Add a Google-style docstring to the test method test_torch_input_preserves_type_device_and_matches_dense_edt describing the test purpose: that it verifies get_surface_distance accepts torch.Tensor inputs, preserves dtype and device, computes Euclidean surface distances using the provided spacing, and matches the reference dense distance_transform_edt result; include brief sections for Args (describe seg_pred, seg_gt, spacing), Expected behavior (dtype/device preserved, values match reference), and any important notes about tolerance used for comparison.

Source: Coding guidelines

jberg5 · 2026-06-10T00:35:21Z

Can any humans confirm that tests need docstrings? I don't see this as a pattern generally. Happy to add them if that's the guidance but feels a little odd.

jberg5 added 2 commits June 9, 2026 16:47

Use a KDTree on CPU instead of full EDT for performance improvement

db716f4

Signed-off-by: J Berg <j.berg2349@gmail.com>

fix docstring

7dd6f73

Signed-off-by: J Berg <j.berg2349@gmail.com>

jberg5 requested review from KumoLiu, Nic-Ma and ericspod as code owners June 10, 2026 00:03

coderabbitai Bot reviewed Jun 10, 2026

View reviewed changes

Comment thread monai/metrics/utils.py

add some tests

cf640e8

Signed-off-by: J Berg <j.berg2349@gmail.com>

coderabbitai Bot reviewed Jun 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perf: Use a KDTree on CPU instead of full EDT for 1.5x to 16x faster metric computation#8910

Perf: Use a KDTree on CPU instead of full EDT for 1.5x to 16x faster metric computation#8910
jberg5 wants to merge 3 commits into
Project-MONAI:devfrom
jberg5:8909-kdtree

jberg5 commented Jun 10, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 10, 2026 •

edited

Loading

Walkthrough

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 10, 2026

Uh oh!

coderabbitai Bot Jun 10, 2026

Uh oh!

coderabbitai Bot Jun 10, 2026

Uh oh!

jberg5 commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jberg5 commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Types of changes

Uh oh!

coderabbitai Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

jberg5 commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jberg5 commented Jun 10, 2026 •

edited

Loading

coderabbitai Bot commented Jun 10, 2026 •

edited

Loading