JS: fix parse errors related to char escapes and char ranges#4648
Conversation
asgerf
left a comment
There was a problem hiding this comment.
Interesting. It seems to be a discrepancy between implementation and specification. And a syntax error is thrown if the regexp has the u flag, like /[a-\w]/u, but we don't have the implement that.
| SourceLocation loc = new SourceLocation(pos()); | ||
| RegExpTerm atom = this.parseCharacterClassAtom(); | ||
| if (!this.lookahead("-]") && this.match("-")) | ||
| for (String c : Arrays.asList("d", "D", "s", "S", "w", "W")) { |
There was a problem hiding this comment.
I think it would be prudent to be a little more efficient here. Creating a list and six string concatenations is quite a lot in order to parse a single character. Keep in mind that we speculatively parse strings as regexps, so this function is quite hot.
A simple first step is to start by checking for - and no more work if the next character is not -. Next we can store the string list -\d, -\D, -\s, -\S. -\w, \-W in a constant to simplify the full check.
|
We'll need to bump the extractor version string as well. |
bdde677 to
044fbc0
Compare
[\\w-z]was parsed as a character class with a range (from\wtoz).That is wrong, it is actually a character class containing the union of
\w,-, andz.We encounter this behavior quite often.