Perf: Use a KDTree on CPU instead of full EDT for 1.5x to 16x faster metric computation#8910
Perf: Use a KDTree on CPU instead of full EDT for 1.5x to 16x faster metric computation#8910jberg5 wants to merge 3 commits into
Conversation
Signed-off-by: J Berg <j.berg2349@gmail.com>
Signed-off-by: J Berg <j.berg2349@gmail.com>
📝 WalkthroughWalkthroughOptimizes Euclidean surface distance computation in Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
monai/metrics/utils.py (1)
264-286: ⚡ Quick winComplete the public docstring contract.
get_surface_distanceis public, but the docstring still omitsReturnsandRaises, so callers have to infer the output type/shape and failure modes from the implementation.As per coding guidelines, "Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@monai/metrics/utils.py` around lines 264 - 286, Update the public docstring for get_surface_distance to include a Returns section stating it returns a 1-D numpy.ndarray (dtype float) of distances where each element is the distance from a surface/edge voxel in seg_pred to its nearest surface voxel in seg_gt (length equals number of edge voxels found in seg_pred), and a Raises section documenting the error conditions: raise ValueError for invalid distance_metric values (not "euclidean", "chessboard", or "taxicab"), raise ValueError if spacing is a sequence whose length does not match the image dimensionality, and raise TypeError if seg_pred/seg_gt are not array-like inputs; also note that if seg_pred or seg_gt contain no foreground/edges the function may produce nan/inf distances (it does not implicitly convert that into an exception).Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@monai/metrics/utils.py`:
- Around line 308-315: Add a unit test that exercises the spacing-scaled KDTree
branch in monai/metrics/utils.py: create simple ground-truth and predicted
binary segmentations (seg_gt, seg_pred) with anisotropic spacing (e.g.,
spacing=(1.0, 2.5, 0.5)), call the surface-distance routine that uses KDTree
(the code path that scales gt_coords and pred_coords and constructs KDTree), and
assert the returned distances match the distances computed by manually scaling
coordinates and querying sklearn.neighbors.KDTree (or by comparing against the
same call with spacing applied to inputs), and also check output dtype via
convert_to_dst_type (e.g., lib.float32) and shape; this will lock in the
spacing-scaling behavior covered by KDTree and convert_to_dst_type.
---
Nitpick comments:
In `@monai/metrics/utils.py`:
- Around line 264-286: Update the public docstring for get_surface_distance to
include a Returns section stating it returns a 1-D numpy.ndarray (dtype float)
of distances where each element is the distance from a surface/edge voxel in
seg_pred to its nearest surface voxel in seg_gt (length equals number of edge
voxels found in seg_pred), and a Raises section documenting the error
conditions: raise ValueError for invalid distance_metric values (not
"euclidean", "chessboard", or "taxicab"), raise ValueError if spacing is a
sequence whose length does not match the image dimensionality, and raise
TypeError if seg_pred/seg_gt are not array-like inputs; also note that if
seg_pred or seg_gt contain no foreground/edges the function may produce nan/inf
distances (it does not implicitly convert that into an exception).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 5cb2d418-a870-4b9d-b2c3-e8a659328cd1
📒 Files selected for processing (1)
monai/metrics/utils.py
Signed-off-by: J Berg <j.berg2349@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/metrics/test_surface_distance.py`:
- Around line 213-222: Add a Google-style docstring to the test method
test_torch_input_preserves_type_device_and_matches_dense_edt describing the test
purpose: that it verifies get_surface_distance accepts torch.Tensor inputs,
preserves dtype and device, computes Euclidean surface distances using the
provided spacing, and matches the reference dense distance_transform_edt result;
include brief sections for Args (describe seg_pred, seg_gt, spacing), Expected
behavior (dtype/device preserved, values match reference), and any important
notes about tolerance used for comparison.
- Around line 204-211: Add a Google-style docstring to the test method
test_cpu_kdtree_euclidean_distances_match_dense_edt describing the test purpose,
inputs, and expected outcome: state that it compares KD-tree CPU implementation
of surface distances (distance_metric="euclidean") against
scipy.ndimage.distance_transform_edt with given spacing, mention the parameters
_name and spacing, and note the assertions made (value equality within
tolerances, dtype float32, and matching shape). Place the docstring directly
under the def line for test_cpu_kdtree_euclidean_distances_match_dense_edt.
- Around line 190-199: The function _edge_masks lacks a Google-style docstring;
add one describing the seed parameter (int, default 0) and the return value
(tuple of two numpy boolean arrays: edges_pred and edges_gt representing edge
masks for prediction and ground truth). Place the docstring immediately under
the def _edge_masks(...) line, follow Google style with Args and Returns
sections, and mention the deterministic randomness from seed and that the
function returns boolean edge mask arrays created by get_mask_edges.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9669e508-773e-4ba2-88cf-7ac298f35348
📒 Files selected for processing (1)
tests/metrics/test_surface_distance.py
| def _edge_masks(seed=0): | ||
| # two offset spheres plus a few scattered false positives in the prediction, so the | ||
| # surfaces are non-trivially apart and an outlier expands the cropped bounding box. | ||
| gt = create_spherical_seg_3d(radius=20, centre=(30, 30, 30)) | ||
| pred = create_spherical_seg_3d(radius=20, centre=(32, 31, 30)) | ||
| rng = np.random.RandomState(seed) | ||
| for _ in range(5): | ||
| pred[tuple(rng.randint(0, s) for s in pred.shape)] = 1 | ||
| edges_pred, edges_gt = get_mask_edges(pred, gt) | ||
| return np.asarray(edges_pred, dtype=bool), np.asarray(edges_gt, dtype=bool) |
There was a problem hiding this comment.
Add Google-style docstring.
Missing docstring violates coding guidelines. Document the seed parameter and the return value (tuple of boolean edge mask arrays).
📝 Example docstring
def _edge_masks(seed=0):
+ """Generate test edge masks from offset spheres with random false positives.
+
+ Args:
+ seed: Random seed for reproducible false positive placement.
+
+ Returns:
+ Tuple of (edges_pred, edges_gt) as boolean numpy arrays.
+ """
# two offset spheres plus a few scattered false positives in the prediction, so the🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/metrics/test_surface_distance.py` around lines 190 - 199, The function
_edge_masks lacks a Google-style docstring; add one describing the seed
parameter (int, default 0) and the return value (tuple of two numpy boolean
arrays: edges_pred and edges_gt representing edge masks for prediction and
ground truth). Place the docstring immediately under the def _edge_masks(...)
line, follow Google style with Args and Returns sections, and mention the
deterministic randomness from seed and that the function returns boolean edge
mask arrays created by get_mask_edges.
Source: Coding guidelines
| def test_cpu_kdtree_euclidean_distances_match_dense_edt(self, _name, spacing): | ||
| edges_pred, edges_gt = _edge_masks() | ||
| result = np.asarray(get_surface_distance(edges_pred, edges_gt, distance_metric="euclidean", spacing=spacing)) | ||
| reference = distance_transform_edt(~edges_gt, sampling=spacing)[edges_pred] | ||
| # same multiset of distances (downstream metrics only use max/percentile/mean) | ||
| np.testing.assert_allclose(np.sort(result), np.sort(reference), rtol=1e-5, atol=1e-5) | ||
| self.assertEqual(result.dtype, np.float32) | ||
| self.assertEqual(result.shape, reference.shape) |
There was a problem hiding this comment.
Add Google-style docstring.
Test method lacks required docstring. Document the test purpose and parameters.
📝 Example docstring
`@parameterized.expand`(KDTREE_SPACINGS)
def test_cpu_kdtree_euclidean_distances_match_dense_edt(self, _name, spacing):
+ """Verify CPU KDTree euclidean distances match dense EDT reference.
+
+ Args:
+ _name: Test case name (from parameterized).
+ spacing: Voxel spacing for distance computation.
+ """
edges_pred, edges_gt = _edge_masks()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_cpu_kdtree_euclidean_distances_match_dense_edt(self, _name, spacing): | |
| edges_pred, edges_gt = _edge_masks() | |
| result = np.asarray(get_surface_distance(edges_pred, edges_gt, distance_metric="euclidean", spacing=spacing)) | |
| reference = distance_transform_edt(~edges_gt, sampling=spacing)[edges_pred] | |
| # same multiset of distances (downstream metrics only use max/percentile/mean) | |
| np.testing.assert_allclose(np.sort(result), np.sort(reference), rtol=1e-5, atol=1e-5) | |
| self.assertEqual(result.dtype, np.float32) | |
| self.assertEqual(result.shape, reference.shape) | |
| def test_cpu_kdtree_euclidean_distances_match_dense_edt(self, _name, spacing): | |
| """Verify CPU KDTree euclidean distances match dense EDT reference. | |
| Args: | |
| _name: Test case name (from parameterized). | |
| spacing: Voxel spacing for distance computation. | |
| """ | |
| edges_pred, edges_gt = _edge_masks() | |
| result = np.asarray(get_surface_distance(edges_pred, edges_gt, distance_metric="euclidean", spacing=spacing)) | |
| reference = distance_transform_edt(~edges_gt, sampling=spacing)[edges_pred] | |
| # same multiset of distances (downstream metrics only use max/percentile/mean) | |
| np.testing.assert_allclose(np.sort(result), np.sort(reference), rtol=1e-5, atol=1e-5) | |
| self.assertEqual(result.dtype, np.float32) | |
| self.assertEqual(result.shape, reference.shape) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/metrics/test_surface_distance.py` around lines 204 - 211, Add a
Google-style docstring to the test method
test_cpu_kdtree_euclidean_distances_match_dense_edt describing the test purpose,
inputs, and expected outcome: state that it compares KD-tree CPU implementation
of surface distances (distance_metric="euclidean") against
scipy.ndimage.distance_transform_edt with given spacing, mention the parameters
_name and spacing, and note the assertions made (value equality within
tolerances, dtype float32, and matching shape). Place the docstring directly
under the def line for test_cpu_kdtree_euclidean_distances_match_dense_edt.
Source: Coding guidelines
| def test_torch_input_preserves_type_device_and_matches_dense_edt(self): | ||
| edges_pred, edges_gt = _edge_masks() | ||
| spacing = (1.0, 2.5, 0.5) | ||
| seg_pred, seg_gt = torch.as_tensor(edges_pred), torch.as_tensor(edges_gt) | ||
| result = get_surface_distance(seg_pred, seg_gt, distance_metric="euclidean", spacing=spacing) | ||
| self.assertIsInstance(result, torch.Tensor) | ||
| self.assertEqual(result.dtype, torch.float32) | ||
| self.assertEqual(result.device, seg_pred.device) | ||
| reference = distance_transform_edt(~edges_gt, sampling=spacing)[edges_pred] | ||
| np.testing.assert_allclose(np.sort(result.cpu().numpy()), np.sort(reference), rtol=1e-5, atol=1e-5) |
There was a problem hiding this comment.
Add Google-style docstring.
Test method lacks required docstring. Document the test purpose.
📝 Example docstring
def test_torch_input_preserves_type_device_and_matches_dense_edt(self):
+ """Verify torch inputs preserve type/device/dtype and match EDT reference."""
edges_pred, edges_gt = _edge_masks()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/metrics/test_surface_distance.py` around lines 213 - 222, Add a
Google-style docstring to the test method
test_torch_input_preserves_type_device_and_matches_dense_edt describing the test
purpose: that it verifies get_surface_distance accepts torch.Tensor inputs,
preserves dtype and device, computes Euclidean surface distances using the
provided spacing, and matches the reference dense distance_transform_edt result;
include brief sections for Args (describe seg_pred, seg_gt, spacing), Expected
behavior (dtype/device preserved, values match reference), and any important
notes about tolerance used for comparison.
Source: Coding guidelines
|
Can any humans confirm that tests need docstrings? I don't see this as a pattern generally. Happy to add them if that's the guidance but feels a little odd. |
Fixes # 8909.
#8909
Description
Pretty straightforward: we don't need to compute the distance between every conceivable voxel, just the edges, so we can use a KDTree on CPU and compute
HausdorffDistanceMetricandSurfaceDistanceMetricsignificantly faster.GPU implementations of KDTrees exist, but I did a little benchmarking and found that they are slower than the full EDT since the distance computations are embarrassingly parallel and well-suited to the hardware (gpu goes brrr), so I left that path unchanged.
Measured speedups (on my M3 mac, and Intel Cascade Lake) range from 1.5x for small inputs to 16x for larger volumes and noisier data.
I think existing test coverage is good enough that we don't need more here - all pass for me, and I've spot checked a few problems to ensure identical output metrics.
Types of changes
./runtests.sh -f -u --net --coverage../runtests.sh --quick --unittests --disttests.make htmlcommand in thedocs/folder.