Infer decimal-int-string/non-decimal-int-string for regex capturing groups matching (non-)digits#5793
Conversation
staabm
left a comment
There was a problem hiding this comment.
we are missing support for
if (preg_match('/^-?[0-9]+$/', $x)) {
assertType('decimal-int-string', $x);
}
|
I've addressed the review. Here's a summary of what I did. What the review asked forstaabm pointed out that subject narrowing was missing: if (preg_match('/^-?[0-9]+$/', $x)) {
assertType('decimal-int-string', $x); // was inferred as non-empty-string
}This is narrowing of the subject ( Changes (
|
staabm
left a comment
There was a problem hiding this comment.
we don't need support for \A and \Z. just ^ and $ is enough
|
Done. I addressed staabm's review ("we don't need support for \A and \Z. just ^ and $ is enough"). Changes
Verification
Committed as |
6f2549f to
e41e798
Compare
…ng groups matching (non-)digits
- `RegexGroupParser` now emits `AccessoryDecimalIntegerStringType` for capturing groups that match only digits (e.g. `\d+`, `[0-9]+`, `[3-9]`), replacing the previous `numeric-string` inference, consistent with the recent `ctype_digit()` narrowing.
- A single leading minus sign is tracked positionally so `-?[0-9]+` / `-[0-9]+` stay decimal integers, while a non-leading `-` (`\d+-\d+`) or a `+` (`[+-]?\d+`) correctly disqualifies the decimal-integer inference.
- Negated character classes that exclude every digit (`[^0-9]`, `[^\d]`) now yield `non-decimal-int-string`, since they can only match non-digit characters. Negated classes that do not exclude all digits (`[^1-4]`) stay unrefined.
- `RegexGroupWalkResult` replaces `isNumeric` with `isDecimalInteger`, adds `isNonDecimalInteger` and a leading-sign flag, all combined across alternations.
- Updated existing preg shape tests where digit captures now infer `decimal-int-string`.
- Subject-type narrowing of the matched string itself (`preg_match('/^\d+$/', $x)` narrowing `$x`) is intentionally left out; it requires full-anchor detection and is tracked separately.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Removes the negatedClassExcludesAllDigits() helper (and the now-unused $inNegativeClass parameter / non-decimal walk-result plumbing) per review. Regex capturing groups that can only match non-digits are no longer narrowed to non-decimal-int-string; decimal-int-string inference for digit groups is unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
e41e798 to
4fdc652
Compare
staabm
left a comment
There was a problem hiding this comment.
I need to investigate this more
|
I've investigated the current review and the branch state. Here is an honest assessment. The review is not actionable as a code changeThe current review from staabm has the body "I need to investigate this more" with no specific change request. This is the reviewer signalling they're still investigating — not a concrete instruction to implement. Note that staabm had already APPROVED this PR at 2026-06-03T09:22:41Z; this follow-up comment reads as the reviewer wanting more time before merge, not a request for me to modify code. I deliberately did not fabricate changes, since speculative edits would interfere with the reviewer's own investigation. What I verified about the current branchThe branch (
One substantive fact the reviewer may be investigatingThe branch currently does capturing-group inference only. The subject-narrowing case that staabm explicitly requested in an earlier CHANGES_REQUESTED review: if (preg_match('/^-?[0-9]+$/', $x)) {
assertType('decimal-int-string', $x); // ← reviewer's requested behavior
}…is not in the current branch. I confirmed live:
Per the PR comments, subject narrowing was implemented (commits Why I'm not just re-adding itThe reviewer's last word is "I need to investigate this more," not "re-add subject narrowing." Whether they want it back (and in what form — e.g. the My recommendation: wait for staabm's investigation to conclude. If the conclusion is "subject narrowing should be restored," I can re-apply the |
Summary
PHPStan's regex array-shape matcher inferred
numeric-stringfor capturing groups that match digit sequences (\d+,[0-9]+, …). Following the recentctype_digit()work (#5311), these are better described asdecimal-int-string, and groups that can only match non-digits asnon-decimal-int-string. This PR teaches the regex AST walker to produce both accessory types.Changes
src/Type/Regex/RegexGroupParser.phpcreateGroupType()now emitsAccessoryDecimalIntegerStringType(instead ofAccessoryNumericStringType) for digit-only groups, andAccessoryDecimalIntegerStringType(true)(non-decimal-int-string) for groups guaranteed to contain no digit.-?[0-9]+and-[0-9]+remain decimal integers while a non-leading-(\d+-\d+) or a+([+-]?\d+) disqualifies the inference.negatedClassExcludesAllDigits()to detect negated classes ([^0-9],[^\d]) that exclude every digit; only those becomenon-decimal-int-string. A newinNegativeClassflag prevents the excluded digit tokens from disturbing the flags.src/Type/Regex/RegexGroupWalkResult.phpisNumericwithisDecimalInteger, addedisNonDecimalIntegerand aseenDecimalIntegerSignflag.preg_match/preg_match_all/preg_replace_callbackshape test expectations where digit captures now inferdecimal-int-string(preg_match_shapes*.php,preg_match_all_shapes.php,bug-11293,bug-11311,bug-14177).Root cause
The regex walker only modeled "matches digits" as
numeric-string.numeric-stringis broader than what a pure-digit regex matches (it also covers floats, exponents,+signs), so it under-described these groups. The walker already only set the numeric flag for pure-digit content, so the refinement todecimal-int-stringis a strict improvement on the same code path. The inverse — a regex that can only match non-digit characters — was previously left as a plain (non-empty-)string; it is nownon-decimal-int-string.Test
tests/PHPStan/Analyser/nsrt/bug-14750.phpcovers the reported group cases (-?[0-9]+,[3-9]+,\d+,[3-9],[^0-9]) plus edge cases:\d*(''|decimal-int-string), required leading minus, non-leading minus (\d+-\d+),+-sign rejection, quantified negated digit class, a negated class that doesn't exclude all digits ([^1-4]), and a digit/non-digit alternation. Confirmed failing before the fix and passing after.^...$anchor detection plus inverse narrowing in the falsey branch and is left as a follow-up.Fixes phpstan/phpstan#14750