-
-
Notifications
You must be signed in to change notification settings - Fork 32k
gh-96859: [argparse] Avoid O(N^2) behavior while consuming optionals #96904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…onals
The loop consuming optional and positional arguments in ArgumentParser._parse_known_args()
rescans option_string_indices on every iteration of the loop in order to compute the next
option index:
```py
next_option_string_index = min([
index
for index in option_string_indices
if index >= start_index])
```
This becomes very slow when parsing a very long list of arguments, e.g. when parsing
a response file.
This commit refactors the loop to scan the option indices list only once.
|
Most changes to Python require a NEWS entry. Please add it using the blurb_it web app or the blurb command-line tool. |
|
@bitdancer, please share your reviews. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code itself looks good, although I had some trouble trying to understand the new flow based on the old flow. Having no particular experience with the code didn't help.
I had one question and some comments about comments. Those should be updated a little bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, what is the etiquette around commits addressing review feedback that are not really meaningful changes by themselves? Should I squash them before the PR is merged?
This does not seem related to the changes on the PR. Also the latest commit only touched comments. |
Try closing and reopening the PR to get the check to re-run. |
Please don't! All CPython PRs are squash-merged anyway, so you don't need to; and PR authors squashing commits together makes it harder for reviewers to see what changed between commits due to the GitHub UI :) |
|
A friendly ping to bring some attention to this PR. @rhettinger, would you be the best person to do a core review? You listed as the maintainer of argparse.py at the top of the file. Please let me know if I should ping someone else. |
|
I used to contribute a lot to argparse as paul.j3, but haven't followed it much since the github move. So this will be a general comment. Backward compatibility has been a major issue; I can think of several changes that either had to be removed, or have remained and continue to cause problem. Thus I favor bug fixes, and feature enhancements that have little chance of causing problems. I discourage core changes that aren't essential. This looks like a core change simply for the sake of performance. I'd vote against it, unless it has been thoroughly tested, not just read through by one or two people. I don't think there are enough argparse developers to do that well. Most users don't use argparse for large inputs. If performance came up on StackOverFlow, I'd tell them to read the input(s) from a file, and reserve argparse for short lists - file names and controls, not data. |
|
Thanks for your feedback, @hpaulj. The change is indeed fixes a performance issue and attempts to not modify the existing behavior in any way. The test_argparse.py suite looks pretty comprehensive and tests a number is of cases involving append actions and a mix of optional/positional arguments. However it does not test the mix of argparse and 3rd party apps using it - it is not an appcompat test suite. I don't believe I can provide any evidence showing that this change does not introduce breakage with in a large body of 3rd party code. If this means this change does not meet the bar - so be it.
The specific case this issue came up in was parsing a response file generated by a build system. It was a reasonable legitimate case of passing diverse command line parameters via a file due to size constraints. I'd agree this is a corner case. |
|
From a quick read of your changes it looks like you are replacing the set, Years ago I added A down side to that is it won't get as much beta testing. But I don't have a good feel for how many problems get resolved with Python beta releases. |
Hm, this is an interesting suggestion...
After this refactoring re-implementing the core parsing loop and exposing it as a new top-level parsing function should be relatively simple. Having two parallel functions also allow cross-checking one against the other. I'm not sure this plan has lower risk of introducing a backward compatibility issue. The refactoring alone is going to be pretty substantial in terms of the size of the patch... |
|
Closing this PR due to lack of interest. |
The loop consuming optional and positional arguments in ArgumentParser._parse_known_args() rescans option_string_indices on every iteration of the loop in order to compute the next option index:
This becomes very slow when parsing a very long list of arguments, e.g. when parsing a response file.
This commit refactors the loop to scan the option indices list only once.