Skip to content

gh-149481: skip FOR_ITER inline specialization for Python __next__#149491

Open
NekoAsakura wants to merge 2 commits intopython:mainfrom
NekoAsakura:gh-149481/skip-slot-iternext
Open

gh-149481: skip FOR_ITER inline specialization for Python __next__#149491
NekoAsakura wants to merge 2 commits intopython:mainfrom
NekoAsakura:gh-149481/skip-slot-iternext

Conversation

@NekoAsakura
Copy link
Copy Markdown
Contributor

@NekoAsakura NekoAsakura commented May 7, 2026

Thanks for the careful bisect and the empirical fix from @savannahostrowski. ❤️

Benchmark
Benchmark unpatched patched Δ
for_iter_dict_items 185.8 ms ± 1.6 ms 187.0 ms ± 2.8 ms +0.68%
for_iter_dict_keys 154.1 ms ± 2.1 ms 153.9 ms ± 2.6 ms -0.11%
for_iter_dict_values 148.2 ms ± 1.0 ms 146.9 ms ± 2.8 ms -0.85%
for_iter_set 179.1 ms ± 1.9 ms 179.0 ms ± 2.2 ms -0.07%
for_iter_reversed 136.6 ms ± 1.2 ms 136.0 ms ± 1.9 ms -0.47%
for_iter_enumerate 201.1 ms ± 2.9 ms 201.1 ms ± 2.4 ms -0.04%
for_iter_zip 230.6 ms ± 1.2 ms 227.9 ms ± 1.3 ms -1.15%
for_iter_list 114.2 ms ± 1.8 ms 115.6 ms ± 2.2 ms +1.26%
for_iter_tuple 104.4 ms ± 1.5 ms 104.7 ms ± 1.7 ms +0.35%
for_iter_range 137.6 ms ± 2.5 ms 137.8 ms ± 1.9 ms +0.13%
Geometric mean -0.03%
Root cause analyse (from opus 4.7)

When __next__ is Python-defined, tp_iternext is the generic slot_tp_iternext slot wrapper that re-enters the eval loop via vectorcall_method. Calling it through the new _ITER_NEXT_INLINE (which captures tp_iternext as a baked-in operand and lives in a tight _JUMP_TO_TOP loop) is correct functionally, but each new outer-iteration class causes _GUARD_TYPE_ITER to fail and warm a side-exit. After ~SIDE_EXIT_INITIAL_VALUE (4000) hits at each side-exit, a side trace forms there; _EXIT_TRACE jumps trace-to-trace via TIER2_TO_TIER2, which doesn't re-check the eval-breaker. With MAX_CHAIN_DEPTH=4 traces possible, the chain accumulates ~4×4000+4002 ≈ 20k inner hits before the system spirals into a tier-2-only loop with no signal checkpoints — matching the observed exact threshold. The pre-PR _FOR_ITER_TIER_TWO path goes through _PyForIter_VirtualIteratorNext, which uses an indirect dispatch on Py_TYPE(iter_o)->tp_iternext and does not produce a per-type guard, so no side-trace storm forms. Falling back to that path for slot iterators sidesteps the issue while keeping _ITER_NEXT_INLINE for C-level iterators (dict, set, enumerate, zip, reversed, etc.), which the existing test_for_iter_direct_* tests exercise.

Copy link
Copy Markdown
Member

@savannahostrowski savannahostrowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this was pretty much the exact fix I had locally. Thanks for the quick turnaround on this @NekoAsakura ❤️

I'll wait to see if anyone else wants to have a look.

Copy link
Copy Markdown
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for doing this.

@savannahostrowski savannahostrowski enabled auto-merge (squash) May 7, 2026 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants