[mypyc] Specialize `s[i] == 'x'` to a codepoint int compare by VaggelisD · Pull Request #21579 · python/mypy

VaggelisD · 2026-06-02T16:41:21Z

7th PR of #21418

Lowers s[i] == 'x' (and the symmetric == / != forms) down to a bounds-checked codepoint read + int compare, instead of CPyStr_GetItem + CPyStr_EqualLiteral which (may) allocate a 1-character PyUnicode per iteration. No annotations are required for this optimization.

On microbenchmarks (1-compare-per-iter hot loop, ~2.5M-codepoint SQL-like string) the comparison is ~3.6x times faster.

Some follow up optimizations that might be worth it I can work on:

In operator e.g s[i] in ('a', 'b', 'c') --> Fuse to one check with N int comparisons
Comparison operators e.g s[i] < 'x' --> Need to expand the op set
s[i] == s[j] --> Need drop the literal-only guard

Recognizes the AST shape `IndexExpr(str) == StrLiteral` (and the symmetric `StrLiteral == IndexExpr(str)`, plus the `!=` variants) and lowers it to an int compare of codepoints reusing the existing CPyStr_GetItemUnsafeAsInt primitive. Today the pattern lowers to CPyStr_GetItem + CPyStr_EqualLiteral, which allocates or looks up a 1-character PyUnicode object per iteration and goes through a generic string-equality call. After specialization it becomes an inlined PyUnicode_READ plus an int compare -- about 4x faster on bench_str_compare with a 3-compares-per-iteration workload, and closer to ~9x with the more typical 1-compare-per-iteration shape. No annotations required; benefits any code that compares a string index against a 1-character literal. Multi-character / empty literals fall through to the generic path (which still correctly returns False). Bounds checking is preserved -- the helper raises IndexError for out-of-range indices, same as the unspecialized path. Stack: builds on the `ord(s[i])` primitive (python#20578) and the librt.strings codepoint helpers (python#21462, python#21504, python#21509, python#21521, python#21522, python#21553).

p-sawicki · 2026-06-03T17:31:44Z

+    if isinstance(rhs, IndexExpr) and not isinstance(lhs, IndexExpr):
+        lhs, rhs = rhs, lhs


i think the errors in the run tests are because of a mypy issue #21586 as it seems rhs is typed as IndexExpr after the swap and assigning lhs to it raises a type error.

you might need to use a temp variable as a work-around as this way it seems to work correctly.

tmp = lhs lhs, rhs = rhs, tmp

Interesting, didn't have time to check it out yesterday but that'd definitely confuse me

@p-sawicki

- Use a temp variable for the swap normalization. Tuple-unpack form (lhs, rhs = rhs, lhs) interacted badly with mypy's narrowing in mypyc-compiled mypy, producing a runtime IndexExpr-vs-StrExpr cast failure (mypy#21586). Workaround per @p-sawicki on PR python#21579. - Drop test_any_dispatch_uses_generic_path. The 'Any' dispatch still calls the mypyc-compiled eq_comma, which has the specialization, so this test was not exercising the unspecialized path as claimed. The IR golden pins the specialized lowering, and eq_two_chars / eq_empty cover the fall-through behavior.

Rename testStrIndexEqLiteral -> testStrIndexEqLiteral_64bit so it skips on 32-bit. The golden output captures the int-unbox-to-i64 path emitted by translate_getitem_with_bounds_check, which differs on 32-bit (extra 'extend signed i: builtins.int to i64' op shifts register numbering). testOrdOfStrIndex_64bit uses the same primitives and follows the same convention. The fall-through golden (testStrIndexEqLiteralNoSpecialize) keeps no suffix; its IR uses CPyStr_GetItem directly with no unboxing.

p-sawicki

looks good, thanks!

p-sawicki reviewed Jun 3, 2026

View reviewed changes

Comment thread mypyc/test-data/run-strings.test Outdated

p-sawicki reviewed Jun 3, 2026

View reviewed changes

VaggelisD force-pushed the str-index-compare-specialize branch 2 times, most recently from e116c95 to 60abe82 Compare June 4, 2026 07:02

VaggelisD added 2 commits June 4, 2026 10:02

p-sawicki approved these changes Jun 4, 2026

View reviewed changes

p-sawicki merged commit 84a20bd into python:master Jun 4, 2026
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mypyc] Specialize `s[i] == 'x'` to a codepoint int compare#21579

[mypyc] Specialize `s[i] == 'x'` to a codepoint int compare#21579
p-sawicki merged 3 commits into
python:masterfrom
VaggelisD:str-index-compare-specialize

VaggelisD commented Jun 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

p-sawicki Jun 3, 2026

Uh oh!

VaggelisD Jun 4, 2026

Uh oh!

p-sawicki left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if isinstance(rhs, IndexExpr) and not isinstance(lhs, IndexExpr):
		lhs, rhs = rhs, lhs

Uh oh!

Conversation

VaggelisD commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

p-sawicki Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

VaggelisD Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

p-sawicki left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

VaggelisD commented Jun 2, 2026 •

edited

Loading