Reduce TypeForm recognition slowdown#21585
Conversation
|
@davidfstr Your original PR was never reverted in master, only in release branch. Hopefully you can still cherry-pick the 4 relevant commits starting from current master. |
Specifically:
* Median is reported, in addition to the existing mean+stdev, which is
significantly more resistant to skew by outliers.
* --metric {wall,cpu} (default wall): Enables profiling using CPU time
rather than wall-clock time. CPU profiling has roughly half the coefficient
of variation as wall-clock profiling equal run count.
* --workers1: Forces MYPY_NUM_WORKERS=1 (rather than the default 4) to
cut CPU scheduling variance. Strongly recommended when using --metric cpu.
* --warmup-runs N (default 1): Configurable number of leading cold runs to discard.
Previously was always 1. Higher run counts decrease outliers that skew
the reported mean.
* A new "Paired deltas vs <first commit>" section is added to the report,
showing per-round paired differencing against the first commit
to cancel round-level common-mode noise, reducing variance.
Reported as median +/-95% CI.
Also:
* --cache-binaries (default false): Caches each commit's compiled clone
to avoid ~5min recompile whenever comparing the same commit multiple times.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…_parse_as_type_expression() Specifically: - If you set MYPY_TYPEFORM_PROFILE_FULL_PARSE environment variable, mypy will output a .tsv to that filepath which characterizes the kinds of Expressions that try_parse_as_type_expression() in semanal.py was forced to do a full parse of, which was not rejected early. - A misc/analyze_typeform_full_parse_profile.py script is added which takes those .tsvs and prints an expression-time summary (by total time) plus top-N descriptors per FAIL class. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…s_type_expression() These filters reduce the mypy's wall clock slowdown when checking the mypy codebase after the introduction of TypeForm from +2.03% to +1.21%, when using `misc/perf_compare.py` to profile. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
a08cc55 to
6fba903
Compare
Done. Now there are only 3 commits. The 4th one reapplied enabling TypeForm by default, which is already on master |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
for more information, see https://pre-commit.ci
|
According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅ |
|
The right baseline is commits after c0cced3 which optimised this path, not your current baseline |
|
Yeah, the best comparison is simply this PR vs master. In general, I think this (heavy focus on regexps) is a wrong approach. Regexps may give some short term win, but they are themselves slow (when compiled with mypyc), and a pain to maintain. Instead I would propose to start with two things:
|
References #21262
Summary
Enabling
TypeFormby default (#21262) regressedmypy's self-check by ~2% on mypyc-compiled builds.
Details
The regression came entirely
from
SemanticAnalyzer.try_parse_as_type_expression, which is now invoked eagerlyon ~2.84M expressions per self-check. Of those, ~3.3% reached the expensive
full-parse block (
expr_to_analyzed_type+isolated_error_analysis), and ~91% ofthose full parses failed — pure wasted work.
This branch adds a series of fast-rejection filters to the offending function (
SemanticAnalyzer.try_parse_as_type_expression), eliminating~83% of the full-parse attempts (3,144 → 542) on mypy's self-check, with zero
correctness regressions. End-to-end this recovers ~43% of the regression by paired
median (21.6 ms of a 50.2 ms CPU-time regression, n=100, CI ±5 ms; consistent with
~34–40% on wall-clock).
This branch also adds additional instrumentation to analyze what kinds of
expressions pass through the fast-rejection filters and make it to a full-parse.
This instrumentation is enabled via the MYPY_TYPEFORM_PROFILE_FULL_PARSE
variable and outputs a .tsv file that can be processed using the new
misc/analyze_typeform_full_parse_profile.pyscript to classify & aggregateexpressions.
Finally this branch extends
misc/perf_compare.pywith additional reportingoptions to reduce measured variance.
Each commit on the PR branch has a detailed message giving more information about the specific changes made.
Optimization Results
CPU time — canonical; lowest-variance metric
022d9bc96baseline16fef2515regression3f39cd753/HEADbranch (all filters)Details
CPU time (user+sys) on a single worker is the lowest-variance estimator here (CV ≈ 3.3%, vs ≈ 7.5% for wall-clock at the same n) and the truest measure of the per-call work being attacked: with one worker the whole self-check runs serially, so all ~2.84M calls land in one CPU-time figure rather than being spread across workers and hidden behind the slowest one.The branch recovers 21.6 ms of the 50.2 ms regression — ~43% by paired median
(~40% by trimmed mean), leaving +28.6 ms (~57%, still well outside the ±5 ms CI).
The recovered fraction is the difference of two deltas measured against the same baseline
within one interleaved run, so it inherits the run's low noise (a conservative
independent-error bound is ≈ ±7.6 ms on the 21.6 ms numerator; the true band is tighter
because the two deltas are positively correlated — same baseline, same per-round machine
state).
Wall-clock — default user-facing metric
022d9bc96baseline16fef2515regression3f39cd753branch022d9bc96baseline16fef2515regression3f39cd753branchDetails
Wall-clock recovery is **~40% at n=100** (10.9 ms of 27.2 ms) and **~34% at n=400** (6.0 ms of 17.6 ms) by paired median — consistent with the CPU figure once you allow for wall-clock's higher variance and the fact that it is bounded by the slowest worker. The recovered fraction (~34–43% across metrics and run lengths) stays stable even though the absolute regression differs a lot between metrics (≈50 ms CPU vs ≈18–27 ms wall) — the reassuring sign that the recovery is real rather than a noise artifact.Using the new full-parse instrumentation
Details
Open Questions
Is the
MYPY_TYPEFORM_PROFILE_FULL_PARSEenv-var name acceptable, or shouldit follow an existing naming convention?
I did not run performance numbers against non-mypy repositories
(like PyTorch and Black) as I originally planned. Would you like me to?
Should the
misc/perf_compare.pychanges ship in this PR, or land separately?They are a general benchmarking-harness improvement (CPU metric, single-worker
mode, median-based reporting, opt-in binary cache), independent of the TypeForm
filters — splitting them into their own PR may be cleaner for review.