Skip to content

[mypyc] Specialize s[i] == 'x' to a codepoint int compare#21579

Merged
p-sawicki merged 3 commits into
python:masterfrom
VaggelisD:str-index-compare-specialize
Jun 4, 2026
Merged

[mypyc] Specialize s[i] == 'x' to a codepoint int compare#21579
p-sawicki merged 3 commits into
python:masterfrom
VaggelisD:str-index-compare-specialize

Conversation

@VaggelisD
Copy link
Copy Markdown
Contributor

@VaggelisD VaggelisD commented Jun 2, 2026

7th PR of #21418

Lowers s[i] == 'x' (and the symmetric == / != forms) down to a bounds-checked codepoint read + int compare, instead of CPyStr_GetItem + CPyStr_EqualLiteral which (may) allocate a 1-character PyUnicode per iteration. No annotations are required for this optimization.

On microbenchmarks (1-compare-per-iter hot loop, ~2.5M-codepoint SQL-like string) the comparison is ~3.6x times faster.


Some follow up optimizations that might be worth it I can work on:

  • In operator e.g s[i] in ('a', 'b', 'c') --> Fuse to one check with N int comparisons
  • Comparison operators e.g s[i] < 'x' --> Need to expand the op set
  • s[i] == s[j] --> Need drop the literal-only guard

Recognizes the AST shape `IndexExpr(str) == StrLiteral` (and the symmetric
`StrLiteral == IndexExpr(str)`, plus the `!=` variants) and lowers it to
an int compare of codepoints reusing the existing CPyStr_GetItemUnsafeAsInt
primitive.

Today the pattern lowers to CPyStr_GetItem + CPyStr_EqualLiteral, which
allocates or looks up a 1-character PyUnicode object per iteration and
goes through a generic string-equality call. After specialization it
becomes an inlined PyUnicode_READ plus an int compare -- about 4x faster
on bench_str_compare with a 3-compares-per-iteration workload, and closer
to ~9x with the more typical 1-compare-per-iteration shape.

No annotations required; benefits any code that compares a string index
against a 1-character literal. Multi-character / empty literals fall
through to the generic path (which still correctly returns False).
Bounds checking is preserved -- the helper raises IndexError for
out-of-range indices, same as the unspecialized path.

Stack: builds on the `ord(s[i])` primitive (python#20578) and the librt.strings
codepoint helpers (python#21462, python#21504, python#21509, python#21521, python#21522, python#21553).
Comment thread mypyc/test-data/run-strings.test Outdated
Comment thread mypyc/irbuild/expression.py Outdated
Comment on lines +994 to +995
if isinstance(rhs, IndexExpr) and not isinstance(lhs, IndexExpr):
lhs, rhs = rhs, lhs
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the errors in the run tests are because of a mypy issue #21586 as it seems rhs is typed as IndexExpr after the swap and assigning lhs to it raises a type error.

you might need to use a temp variable as a work-around as this way it seems to work correctly.

tmp = lhs
lhs, rhs = rhs, tmp

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, didn't have time to check it out yesterday but that'd definitely confuse me

@VaggelisD VaggelisD force-pushed the str-index-compare-specialize branch 2 times, most recently from e116c95 to 60abe82 Compare June 4, 2026 07:02
VaggelisD added 2 commits June 4, 2026 10:02
- Use a temp variable for the swap normalization. Tuple-unpack form
  (lhs, rhs = rhs, lhs) interacted badly with mypy's narrowing in
  mypyc-compiled mypy, producing a runtime IndexExpr-vs-StrExpr cast
  failure (mypy#21586). Workaround per @p-sawicki on PR python#21579.

- Drop test_any_dispatch_uses_generic_path. The 'Any' dispatch still
  calls the mypyc-compiled eq_comma, which has the specialization, so
  this test was not exercising the unspecialized path as claimed. The
  IR golden pins the specialized lowering, and eq_two_chars / eq_empty
  cover the fall-through behavior.
Rename testStrIndexEqLiteral -> testStrIndexEqLiteral_64bit so it skips
on 32-bit. The golden output captures the int-unbox-to-i64 path emitted
by translate_getitem_with_bounds_check, which differs on 32-bit (extra
'extend signed i: builtins.int to i64' op shifts register numbering).
testOrdOfStrIndex_64bit uses the same primitives and follows the same
convention.

The fall-through golden (testStrIndexEqLiteralNoSpecialize) keeps no
suffix; its IR uses CPyStr_GetItem directly with no unboxing.
Copy link
Copy Markdown
Collaborator

@p-sawicki p-sawicki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, thanks!

@p-sawicki p-sawicki merged commit 84a20bd into python:master Jun 4, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants