gh-96859: [argparse] Avoid O(N^2) behavior while consuming optionals #96904

alexeypa · 2022-09-18T01:01:16Z

The loop consuming optional and positional arguments in ArgumentParser._parse_known_args() rescans option_string_indices on every iteration of the loop in order to compute the next option index:

next_option_string_index = min([
    index
    for index in option_string_indices
    if index >= start_index])

This becomes very slow when parsing a very long list of arguments, e.g. when parsing a response file.

This commit refactors the loop to scan the option indices list only once.

Issue: argparse.ArgumentParser is slow when parsing a very long option list #96859

…onals The loop consuming optional and positional arguments in ArgumentParser._parse_known_args() rescans option_string_indices on every iteration of the loop in order to compute the next option index: ```py next_option_string_index = min([ index for index in option_string_indices if index >= start_index]) ``` This becomes very slow when parsing a very long list of arguments, e.g. when parsing a response file. This commit refactors the loop to scan the option indices list only once.

bedevere-bot · 2022-09-18T01:01:23Z

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

ghost · 2022-09-18T01:01:43Z

All commit authors signed the Contributor License Agreement.

kumiDa · 2022-09-18T08:52:55Z

@bitdancer, please share your reviews.

DanielNoord

Code itself looks good, although I had some trouble trying to understand the new flow based on the old flow. Having no particular experience with the code didn't help.

I had one question and some comments about comments. Those should be updated a little bit.

Lib/argparse.py

…onal()

alexeypa

BTW, what is the etiquette around commits addressing review feedback that are not really meaningful changes by themselves? Should I squash them before the PR is merged?

Lib/argparse.py

…sed optionals

Lib/argparse.py

alexeypa · 2022-09-20T15:50:37Z

Tests / Windows (x86) (pull_request) Failing after 31m
https://github.com/python/cpython/actions/runs/3091293898/jobs/5001192381

This does not seem related to the changes on the PR. Also the latest commit only touched comments.

AlexWaygood · 2022-09-20T16:17:21Z

Tests / Windows (x86) (pull_request) Failing after 31m
https://github.com/python/cpython/actions/runs/3091293898/jobs/5001192381

This does not seem related to the changes on the PR. Also the latest commit only touched comments.

Try closing and reopening the PR to get the check to re-run.

AlexWaygood · 2022-09-20T16:19:30Z

BTW, what is the etiquette around commits addressing review feedback that are not really meaningful changes by themselves? Should I squash them before the PR is merged?

Please don't! All CPython PRs are squash-merged anyway, so you don't need to; and PR authors squashing commits together makes it harder for reviewers to see what changed between commits due to the GitHub UI :)

alexeypa · 2022-10-22T18:16:00Z

A friendly ping to bring some attention to this PR.

@rhettinger, would you be the best person to do a core review? You listed as the maintainer of argparse.py at the top of the file. Please let me know if I should ping someone else.

hpaulj · 2022-11-04T16:23:11Z

I used to contribute a lot to argparse as paul.j3, but haven't followed it much since the github move. So this will be a general comment.

Backward compatibility has been a major issue; I can think of several changes that either had to be removed, or have remained and continue to cause problem. Thus I favor bug fixes, and feature enhancements that have little chance of causing problems. I discourage core changes that aren't essential.

This looks like a core change simply for the sake of performance. I'd vote against it, unless it has been thoroughly tested, not just read through by one or two people. I don't think there are enough argparse developers to do that well.

Most users don't use argparse for large inputs. If performance came up on StackOverFlow, I'd tell them to read the input(s) from a file, and reserve argparse for short lists - file names and controls, not data.

alexeypa · 2022-11-07T06:35:53Z

Thanks for your feedback, @hpaulj. The change is indeed fixes a performance issue and attempts to not modify the existing behavior in any way. The test_argparse.py suite looks pretty comprehensive and tests a number is of cases involving append actions and a mix of optional/positional arguments. However it does not test the mix of argparse and 3rd party apps using it - it is not an appcompat test suite. I don't believe I can provide any evidence showing that this change does not introduce breakage with in a large body of 3rd party code. If this means this change does not meet the bar - so be it.

If performance came up on StackOverFlow, I'd tell them to read the input(s) from a file, and reserve argparse for short lists - file names and controls, not data.

The specific case this issue came up in was parsing a response file generated by a build system. It was a reasonable legitimate case of passing diverse command line parameters via a file due to size constraints. I'd agree this is a corner case.

hpaulj · 2022-11-07T16:25:53Z

From a quick read of your changes it looks like you are replacing the set, option_string_indices with list option_tuples and option_tuple_index.

Years ago I added parse_intermixed_args. I created a whole new pair of functions rather than add some sort of parameter and alternative parsing path in the default parse_args. You might consider something similar with a parse_faster_args. At least until we get more feedback, leave the current parser as the default, but provide interested users with an alternative.

A down side to that is it won't get as much beta testing. But I don't have a good feel for how many problems get resolved with Python beta releases.

alexeypa · 2022-11-09T07:02:19Z

You might consider something similar with a parse_faster_args

Hm, this is an interesting suggestion...

_parse_known_args() is 200+ lines long. It would not be wise just clone it and apply this patch to the new version. We can probably refactor _parse_known_args() to make it more manageable - move sub-functions out, maybe break it up into smaller functions. It should be possible to do it and demonstrate that all these changes are just moving existing code around.

After this refactoring re-implementing the core parsing loop and exposing it as a new top-level parsing function should be relatively simple. Having two parallel functions also allow cross-checking one against the other.

I'm not sure this plan has lower risk of introducing a backward compatibility issue. The refactoring alone is going to be pretty substantial in terms of the size of the patch...

alexeypa · 2023-01-23T01:00:01Z

Closing this PR due to lack of interest.

bedevere-bot added the awaiting review label Sep 18, 2022

📜🤖 Added by blurb_it.

b8641d9

alexeypa mentioned this pull request Sep 19, 2022

argparse.ArgumentParser is slow when parsing a very long option list #96859

Open

DanielNoord suggested changes Sep 19, 2022

View reviewed changes

Lib/argparse.py Outdated Show resolved Hide resolved

Lib/argparse.py Outdated Show resolved Hide resolved

Lib/argparse.py Show resolved Hide resolved

bedevere-bot added awaiting core review and removed awaiting review labels Sep 19, 2022

pythongh-96859: [argparse] Unpack optionals when calling consume_opti…

d16575d

…onal()

alexeypa commented Sep 20, 2022

View reviewed changes

Lib/argparse.py Show resolved Hide resolved

Lib/argparse.py Outdated Show resolved Hide resolved

Lib/argparse.py Outdated Show resolved Hide resolved

pythongh-96859: [argparse] Clarified the comment about collecting par…

be33dad

…sed optionals

DanielNoord approved these changes Sep 20, 2022

View reviewed changes

Lib/argparse.py Show resolved Hide resolved

AlexWaygood closed this Sep 20, 2022

AlexWaygood reopened this Sep 20, 2022

AlexWaygood added the performance Performance or resource usage label Oct 22, 2022

alexeypa closed this Jan 23, 2023

Uh oh!

gh-96859: [argparse] Avoid O(N^2) behavior while consuming optionals #96904

gh-96859: [argparse] Avoid O(N^2) behavior while consuming optionals #96904

Uh oh!

Conversation

alexeypa commented Sep 18, 2022 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bedevere-bot commented Sep 18, 2022

Uh oh!

ghost commented Sep 18, 2022 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kumiDa commented Sep 18, 2022

Uh oh!

DanielNoord left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexeypa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexeypa commented Sep 20, 2022

Uh oh!

AlexWaygood commented Sep 20, 2022

Uh oh!

AlexWaygood commented Sep 20, 2022

Uh oh!

alexeypa commented Oct 22, 2022

Uh oh!

hpaulj commented Nov 4, 2022

Uh oh!

alexeypa commented Nov 7, 2022

Uh oh!

hpaulj commented Nov 7, 2022

Uh oh!

alexeypa commented Nov 9, 2022

Uh oh!

alexeypa commented Jan 23, 2023

Uh oh!

Uh oh!

alexeypa commented Sep 18, 2022 •

edited by bedevere-bot

Loading

ghost commented Sep 18, 2022 •

edited by ghost

Loading