Skip to content

JS: fix parse errors related to char escapes and char ranges#4648

Merged
codeql-ci merged 5 commits into
github:mainfrom
erik-krogh:regexpParse
Nov 16, 2020
Merged

JS: fix parse errors related to char escapes and char ranges#4648
codeql-ci merged 5 commits into
github:mainfrom
erik-krogh:regexpParse

Conversation

@erik-krogh
Copy link
Copy Markdown
Contributor

[\\w-z] was parsed as a character class with a range (from \w to z).
That is wrong, it is actually a character class containing the union of \w, -, and z.

We encounter this behavior quite often.

@erik-krogh erik-krogh requested a review from a team as a code owner November 10, 2020 20:05
@github-actions github-actions Bot added the JS label Nov 10, 2020
Copy link
Copy Markdown
Contributor

@asgerf asgerf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. It seems to be a discrepancy between implementation and specification. And a syntax error is thrown if the regexp has the u flag, like /[a-\w]/u, but we don't have the implement that.

SourceLocation loc = new SourceLocation(pos());
RegExpTerm atom = this.parseCharacterClassAtom();
if (!this.lookahead("-]") && this.match("-"))
for (String c : Arrays.asList("d", "D", "s", "S", "w", "W")) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be prudent to be a little more efficient here. Creating a list and six string concatenations is quite a lot in order to parse a single character. Keep in mind that we speculatively parse strings as regexps, so this function is quite hot.

A simple first step is to start by checking for - and no more work if the next character is not -. Next we can store the string list -\d, -\D, -\s, -\S. -\w, \-W in a constant to simplify the full check.

@asgerf
Copy link
Copy Markdown
Contributor

asgerf commented Nov 11, 2020

We'll need to bump the extractor version string as well.

Copy link
Copy Markdown
Contributor

@asgerf asgerf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@codeql-ci codeql-ci merged commit 09cfb24 into github:main Nov 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants