perf: Speed up s8q NHWC max pooling by morgolock · Pull Request #1292 · ARM-software/ComputeLibrary

morgolock · 2026-06-02T14:33:39Z

Refactor the A64, SVE and SME QASYMM8_SIGNED differing-qinfo NHWC MAX pooling kernels to reduce the cost of the requantized path.

The old code did:

add output offset
explicit clamp to [-128, 127] with smax / smin
pack down with several uzp1 shuffles

The new code does:

add output offset
saturating narrow directly with sqxtn / sqxtn2 from s32 -> s16 -> s8

Why this helps:

sqxtn already performs the signed saturation, so the explicit clamp is redundant.
It also lets us remove a chunk of shuffle/packing work.
So the requantized epilogue gets shorter: fewer instructions, less register traffic, less packing overhead.

Change-Id: I47a378aeda61de86f2d6784393ccb4a08984706c

gunes-arm

Nice work Pablo. I only have doc-related suggestions.

(1)
I suggest we remove the following section from the commit description as it's highly input configuration dependent:

For the tested diff-qinfo cases this improves steady-state latency by:
- A64: 4.7% to 5.3%
- SVE2: 0.8% to 3.9%
- SME2: 3.9% to 8.6%

(2) MR title and commit title should be the same. I'd say the following is fine; note the capital start:
perf: Speed up s8q NHWC max pooling

Refactor the A64, SVE and SME QASYMM8_SIGNED differing-qinfo NHWC MAX pooling kernels to reduce the cost of the requantized path. The old code did: - add output offset - explicit clamp to [-128, 127] with smax / smin - pack down with several uzp1 shuffles - The new code does: - add output offset - saturating narrow directly with sqxtn / sqxtn2 from s32 -> s16 -> s8 Why this helps: - sqxtn already performs the signed saturation, so the explicit clamp is redundant. - It also lets us remove a chunk of shuffle/packing work. - So the requantized epilogue gets shorter: fewer instructions, less register traffic, less packing overhead. Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com> Change-Id: I47a378aeda61de86f2d6784393ccb4a08984706c

DavidMansell · 2026-06-03T09:10:34Z

Looks good to me.

morgolock · 2026-06-03T09:14:38Z

Nice work Pablo. I only have doc-related suggestions.

(1) I suggest we remove the following section from the commit description as it's highly input configuration dependent:
For the tested diff-qinfo cases this improves steady-state latency by:
- A64: 4.7% to 5.3%
- SVE2: 0.8% to 3.9%
- SME2: 3.9% to 8.6%
(2) MR title and commit title should be the same. I'd say the following is fine; note the capital start: perf: Speed up s8q NHWC max pooling

Thanks. All addressed in latest patchset.

gunes-arm · 2026-06-03T09:30:27Z

Can you also remove

A64: 4.7% to 5.3%
SVE2: 0.8% to 3.9%
SME2: 3.9% to 8.6%

from the description as well.

morgolock · 2026-06-03T09:57:21Z

Can you also remove
A64: 4.7% to 5.3%
SVE2: 0.8% to 3.9%
SME2: 3.9% to 8.6%
from the description as well.

Done.

morgolock requested review from DavidMansell and gunes-arm June 2, 2026 14:37

gunes-arm requested changes Jun 2, 2026

View reviewed changes

morgolock force-pushed the pr/pool_quant_qinfo_optim branch from 9272e82 to f78b12c Compare June 3, 2026 09:13

morgolock changed the title ~~perf: speed up s8q NHWC max pooling~~ perf: Speed up s8q NHWC max pooling Jun 3, 2026

gunes-arm approved these changes Jun 3, 2026

View reviewed changes

morgolock merged commit 68d63f3 into main Jun 3, 2026
2 checks passed

morgolock deleted the pr/pool_quant_qinfo_optim branch June 3, 2026 10:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Speed up s8q NHWC max pooling#1292

perf: Speed up s8q NHWC max pooling#1292
morgolock merged 1 commit into
mainfrom
pr/pool_quant_qinfo_optim

morgolock commented Jun 2, 2026 •

edited

Loading

Uh oh!

gunes-arm left a comment

Uh oh!

DavidMansell commented Jun 3, 2026

Uh oh!

morgolock commented Jun 3, 2026

Uh oh!

gunes-arm commented Jun 3, 2026

Uh oh!

morgolock commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

morgolock commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gunes-arm left a comment

Choose a reason for hiding this comment

Uh oh!

DavidMansell commented Jun 3, 2026

Uh oh!

morgolock commented Jun 3, 2026

Uh oh!

gunes-arm commented Jun 3, 2026

Uh oh!

morgolock commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

morgolock commented Jun 2, 2026 •

edited

Loading