Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CVE-2023-27043] gh-102988: Reject malformed addresses in email.parseaddr() #111116

Merged
merged 14 commits into from Dec 15, 2023

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Oct 20, 2023

Detect email address parsing errors and return empty tuple to indicate the parsing error (old API). Add an optional 'strict' parameter to getaddresses() and parseaddr() functions. Patch by Thomas Dwyer.


📚 Documentation preview 📚: https://cpython-previews--111116.org.readthedocs.build/

@vstinner
Copy link
Member Author

@gpshead @serhiy-storchaka @bitdancer @warsaw: Would you mind to review this security fix?

See issue gh-102988 for the context.

This PR is a copy of PR #108250 but I added strict=True parameter, so it's possible to get the old behavior. I added tests on both modes, strict=True and strict=False.

@vstinner
Copy link
Member Author

This PR is a copy of PR #108250 but I added strict=True parameter

My colleague Lumir Balhar @frenzymadness ran an impact check of PR #108250 on Fedora: in short, there is no impact, the test suite of all Python packages (in Fedora) pass with the change. While there were some build errors, they were unrelated to the email issue. For details, see https://copr.fedorainfracloud.org/coprs/lbalhar/email-CVE/builds/ COPR which as more than 4300 builds.

Now with an additional strict parameter, if there is any impacted project, at least there is a way to "opt out".

@vstinner
Copy link
Member Author

@tdwyer: Would you mind to review my change, to see if I preserved your work correctly? (code and tests)

@vstinner vstinner added type-security A security issue needs backport to 3.8 only security fixes needs backport to 3.9 only security fixes needs backport to 3.10 only security fixes needs backport to 3.11 bug and security fixes needs backport to 3.12 bug and security fixes labels Oct 27, 2023
@vstinner
Copy link
Member Author

I think that we should backport the change to all branches accepting security fixes. Problem: the change refer to version numbers, which as .. versionchanged:: 3.13. I suppose that if the change is backported, we should compute the next version of each branch, so backport manually.

@vstinner
Copy link
Member Author

@ambv @SethMichaelLarson: Would you mind to review this PR?

@ambv ambv changed the title gh-102988: email parseaddr() now rejects malformed address gh-102988: Reject malformed addresses in email.parseaddr() Oct 27, 2023
@ambv
Copy link
Contributor

ambv commented Oct 27, 2023

Why is this a separate PR from #108250?

@@ -165,7 +165,7 @@ email
encountered instead of potentially inaccurate values. Add optional *strict*
parameter to these two functions: use ``strict=False`` to get the old
behavior, accept malformed inputs.
(Contributed by Thomas Dwyer for :gh:`102988` to ameliorate CVE-2023-27043
(Contributed by Thomas Dwyer for :gh:`102988` to improve the CVE-2023-27043
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL a new word.

@@ -42,6 +42,8 @@

specialsre = re.compile(r'[][\\()<>@,:;".]')
escapesre = re.compile(r'[\\"]')
realname_comma_re = re.compile(r'"[^"]*,[^"]*"')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
realname_comma_re = re.compile(r'"[^"]*,[^"]*"')
realname_comma_re = re.compile(r'"[^",]*+,[^"]*+"')

It is faster. But I am not sure that the use of such regex is correct.

def _pre_parse_validation(email_header_fields):
accepted_values = []
for v in email_header_fields:
s = v.replace('\\(', '').replace('\\)', '')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what if that backslash was already escaped with a backslash? For example \\) or \\\\).

Lib/email/utils.py Outdated Show resolved Hide resolved
Lib/email/utils.py Outdated Show resolved Hide resolved
@vstinner
Copy link
Member Author

Why is this a separate PR from #108250?

I'm not the author of the other PR. I copied the other PR and added strict parameter.

@ambv
Copy link
Contributor

ambv commented Oct 27, 2023

I'm not the author of this PR and I was able to make commits to it.

@vstinner
Copy link
Member Author

I'm not the author of this PR and I was able to make commits to it.

I don't feel comfortable to make significant change of a PR without asking the author. I prefer to create a separated PR and ask for review.

@vstinner
Copy link
Member Author

vstinner commented Oct 30, 2023

Is this behavior a bug or a feature? I don't know how ; is supposed to behave.

$ python
Python 3.11.6 (main, Oct  3 2023, 00:00:00) [GCC 13.2.1 20230728 (Red Hat 13.2.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from email.utils import getaddresses
>>> from pprint import pprint
>>> pprint(getaddresses('<bob@example.org>; <alice@example.org>'))
[('', ''),
 ('', 'b'),
 ('', 'o'),
 ('', 'b'),
 ('', ''),
 ('', 'e'),
 ('', 'x'),
 ('', 'a'),
 ('', 'm'),
 ('', 'p'),
 ('', 'l'),
 ('', 'e'),
 ('', '.'),
 ('', 'o'),
 ('', 'r'),
 ('', 'g'),
 ('', ''),
 ('', ''),
 ('', ''),
 ('', ''),
 ('', 'a'),
 ('', 'l'),
 ('', 'i'),
 ('', 'c'),
 ('', 'e'),
 ('', ''),
 ('', 'e'),
 ('', 'x'),
 ('', 'a'),
 ('', 'm'),
 ('', 'p'),
 ('', 'l'),
 ('', 'e'),
 ('', '.'),
 ('', 'o'),
 ('', 'r'),
 ('', 'g'),
 ('', '')]

@vstinner
Copy link
Member Author

Is this behavior a bug or a feature? I don't know how ; is supposed to behave.

Oh. getaddresses() expects a sequence, not a string :-)

@vstinner
Copy link
Member Author

Except of parsedate_tz() function, Lib/email/_parseaddr.py file didn't evolve much it was added in 2002 by commit 030ddf7. The file was created from Lib/rfc822.py which was added in 1992 (commit 01ca336).

The latest major change was done in... 1997 with commit be7c45e

New address parser by Ben Escoto replaces Sjoerd Mullender's parseaddr()

The latest minor change was done in 2019 to fix CVE-2019-16056: commit 8cb65d1 of issue #78336.

@vstinner
Copy link
Member Author

What if the input is '"Jane Doe" jane@example.net, "John Doe" john@example.net'?

Oh, realname_comma_re replaces "Jane Doe" <jane@example.net>, "John Doe" <john@example.net> with "Jane Doe <john@example.net> which is invalid...

@vstinner
Copy link
Member Author

@vstinner vstinner marked this pull request as draft October 30, 2023 14:20
@vstinner
Copy link
Member Author

@gpshead:

Of note, this particular bug was not declared to be a security vulnerability by our project. That was done by the original reporter filing a CVE on their own 1-2 months after they privately reported it to the PSRT without input from us on whether it should be considered one or not. (This was before we were setup as a CNA so we had no ability to validate that CVE filing) It's a "meh" low severity issue at best, I'm not sure we'd have even filed a CVE for this if given the choice.

The problem is that Red Hat Security Team has requirements for CVE depending on their severity. This one was qualified as Moderate severity which means short time to provide a fix: https://access.redhat.com/security/cve/cve-2023-27043 Well, I can try to argue with them to reduce the severity and so relax the urgency of a fix 😁

@vstinner
Copy link
Member Author

That PR #105127 caused a behavior regression issue #106669 and thus the rollback PR #106733.

test_email is far from being complete, I cannot promise this change to be bug free. I can only show with the CI that it pass the current test_email test suite. Tests were added on "real names containing a comma". And a few more tests on edge cases discovered during the review.

@encukou
Copy link
Member

encukou commented Dec 15, 2023

The problem is that Red Hat Security Team has requirements for CVE depending on their severity

That is entirely Red Hat's issue. Red Hat has the tools to make its own decisions about backporting, backwards compatibility, and defaults -- and often uses them in this kind of situation, because its policies typical users are different from CPython's.
Here, let's find what's best for CPython, and do that.

I prefer to add a simple boolean parameter (flag) to these functions rather than using the "heavy" policy API

What sells it to me is that we can add a heavy API later -- we'll “just 1�7 need to figure out what strict=True maps to.

just land this in 3.13

+1, merging it as a bugfix, but delaying backports to make fixes/rollbacks easier, makes sense to me.

@vstinner vstinner merged commit 4a153a1 into python:main Dec 15, 2023
29 checks passed
@vstinner vstinner deleted the parse_email branch December 15, 2023 15:10
@vstinner
Copy link
Member Author

I'm fine with making the change in Python 3.13, wait feedback on a few Python 3.13 releases and then decide for the backports.

@gpshead:

Of note, this particular bug was not declared to be a security vulnerability by our project.

The issue has the security tag, should it be removed? Also, a CVE was assigned to it, before Python became a CNA.

The minority of application owners who require pedantically accurate email address parsing as a security constraint should already be well aware of this form of issue by now and will already have had to create their own custom solutions.

Does someone known if there is a third party PyPI module with a good implementation of the email RFC?

corona10 pushed a commit to corona10/cpython that referenced this pull request Dec 15, 2023
 1�7.parseaddr() (python#111116)

Detect email address parsing errors and return empty tuple to
indicate the parsing error (old API). Add an optional 'strict'
parameter to getaddresses() and parseaddr() functions. Patch by
Thomas Dwyer.

Co-Authored-By: Thomas Dwyer <github@tomd.tel>
@gpshead gpshead added the type-bug An unexpected behavior, bug, or error label Dec 15, 2023
@gpshead
Copy link
Member

gpshead commented Dec 15, 2023

I don't mind the CVE on this one, even if I'm not convinced I would have filed one. We did clearly have buggy behavior in our library. We can't meaningfully undo the CVE existing anyways - I don't think this is worth filing a disputed claim over.

I think it is fine to tag things as security issues even if we aren't backporting it or are holding off on deciding if we should backport it. We've done that on some other non-severe security/bug changes in the past (which aren't in my mind right now).

I'll probably be applying a backport to our 3.11 runtime at work as well. :) As far as I'm concerned, that just increases testing to be confident we can do it in CPython's patch releases - if we encountered any issues as a result, I'd pipe up on the issue or file a new issue as appropriate.

(so a big thank you for working on this with backporting in mind, regardless of your own motivating reasons!)

@mcepl
Copy link
Contributor

mcepl commented Dec 17, 2023

@vstinner :

The problem is that Red Hat Security Team has requirements for CVE depending on their severity. This one was qualified as Moderate severity which means short time to provide a fix: https://access.redhat.com/security/cve/cve-2023-27043 Well, I can try to argue with them to reduce the severity and so relax the urgency of a fix 😁

Me too (SUSE). I have a hard time to persuade myself that this CVE truly deserves a 5.3 score.

frenzymadness pushed a commit to frenzymadness/cpython that referenced this pull request Dec 18, 2023
…n email.parseaddr() (python#111116)

Detect email address parsing errors and return empty tuple to
indicate the parsing error (old API). Add an optional 'strict'
parameter to getaddresses() and parseaddr() functions. Patch by
Thomas Dwyer.

Co-Authored-By: Thomas Dwyer <github@tomd.tel>
frenzymadness pushed a commit to frenzymadness/cpython that referenced this pull request Dec 18, 2023
…n email.parseaddr() (python#111116)

Detect email address parsing errors and return empty tuple to
indicate the parsing error (old API). Add an optional 'strict'
parameter to getaddresses() and parseaddr() functions. Patch by
Thomas Dwyer.

Co-Authored-By: Thomas Dwyer <github@tomd.tel>
frenzymadness pushed a commit to frenzymadness/cpython that referenced this pull request Dec 18, 2023
…n email.parseaddr() (python#111116)

Detect email address parsing errors and return empty tuple to
indicate the parsing error (old API). Add an optional 'strict'
parameter to getaddresses() and parseaddr() functions. Patch by
Thomas Dwyer.

Co-Authored-By: Thomas Dwyer <github@tomd.tel>
frenzymadness pushed a commit to frenzymadness/cpython that referenced this pull request Dec 18, 2023
…n email.parseaddr() (python#111116)

Detect email address parsing errors and return empty tuple to
indicate the parsing error (old API). Add an optional 'strict'
parameter to getaddresses() and parseaddr() functions. Patch by
Thomas Dwyer.

Co-Authored-By: Thomas Dwyer <github@tomd.tel>
frenzymadness pushed a commit to frenzymadness/cpython that referenced this pull request Dec 18, 2023
…n email.parseaddr() (python#111116)

Detect email address parsing errors and return empty tuple to
indicate the parsing error (old API). Add an optional 'strict'
parameter to getaddresses() and parseaddr() functions. Patch by
Thomas Dwyer.

Co-Authored-By: Thomas Dwyer <github@tomd.tel>
frenzymadness pushed a commit to frenzymadness/cpython that referenced this pull request Dec 18, 2023
…n email.parseaddr() (python#111116)

Detect email address parsing errors and return empty tuple to
indicate the parsing error (old API). Add an optional 'strict'
parameter to getaddresses() and parseaddr() functions. Patch by
Thomas Dwyer.

Co-Authored-By: Thomas Dwyer <github@tomd.tel>
@vstinner
Copy link
Member Author

Me too (SUSE). I have a hard time to persuade myself that this CVE truly deserves a 5.3 score.

I'm trying to not argue about CVE severity/score. I let the security experts decide that :-) Well, I have opinions about it, but I prefer to keep them for myself 🤣.

@mcepl
Copy link
Contributor

mcepl commented Dec 18, 2023 via email

hroncok pushed a commit to fedora-python/cpython that referenced this pull request Dec 19, 2023
…n email.parseaddr() (python#111116)

Detect email address parsing errors and return empty tuple to
indicate the parsing error (old API). Add an optional 'strict'
parameter to getaddresses() and parseaddr() functions. Patch by
Thomas Dwyer.

Co-Authored-By: Thomas Dwyer <github@tomd.tel>
frenzymadness pushed a commit to frenzymadness/cpython that referenced this pull request Dec 21, 2023
…n email.parseaddr() (python#111116)

Detect email address parsing errors and return empty tuple to
indicate the parsing error (old API). Add an optional 'strict'
parameter to getaddresses() and parseaddr() functions. Patch by
Thomas Dwyer.

Co-Authored-By: Thomas Dwyer <github@tomd.tel>
frenzymadness pushed a commit to frenzymadness/cpython that referenced this pull request Dec 22, 2023
…n email.parseaddr() (python#111116)

Detect email address parsing errors and return empty tuple to
indicate the parsing error (old API). Add an optional 'strict'
parameter to getaddresses() and parseaddr() functions. Patch by
Thomas Dwyer.

Co-Authored-By: Thomas Dwyer <github@tomd.tel>

Changes for Python 2:
- Define encoding for test_email
- Adjust import so we don't need change the tests
- Do not use f-strings
- Do not use SubTest
- KW only function arguments are not supported
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error type-security A security issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants