Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create_datagram_endpoint() no longer compatible with CAN bus raw sockets #114887

Closed
tjhowse opened this issue Feb 2, 2024 · 4 comments
Closed
Labels
3.11 only security fixes 3.12 bugs and security fixes 3.13 new features, bugs and security fixes topic-asyncio type-bug An unexpected behavior, bug, or error

Comments

@tjhowse
Copy link
Contributor

tjhowse commented Feb 2, 2024

Bug report

Bug description:

Problem description

We are using create_datagram_endpoint() to set up an asyncio-managed connection to a native hardware CAN bus interface at can0 under debian linux. We are using the sock named argument to pass in a pre-built raw socket to the FD.

This works in Python 3.5.3 but broke following a migration to 3.11.4. We found that the datagram endpoint would fail to be brought up, and the connection_made(transport) method on the protocol_factory would never be called. When I rejigged our code to await the coroutine I got:

    ValueError: A UDP Socket was expected, got <socket.socket fd=7, family=29, type=3, proto=1, laddr=('can0',)

I believe this has gone unnoticed due to hardware CAN bus interfaces being rare.

Relevant history

#4231
#4898
#4922

Discussion of underlying issues: https://bugs.python.org/issue32331

Core issue

A socket connected to a CAN bus has sock.type socket.SOCK_RAW, I.E. 0b11. This fails the comparison to socket.SOCK_DGRAM (0b01) so create_datagram_endpoint() raises an exception because the socket's type fails the "is a datagram" check, even though it is. In 3.5.3 the check was:

    (sock.type & socket.SOCK_DGRAM) == socket.SOCK_DGRAM

This worked because it was treating SOCK_DGRAM as a bitwise mask, which it kinda is, but kinda isn't.

x86_64-linux-gnu/bits/socket_type.h:

    ...
    /* Types of sockets.  */
    enum __socket_type
    {
    SOCK_STREAM = 1,		/* Sequenced, reliable, connection-based
                    byte streams.  */
    #define SOCK_STREAM SOCK_STREAM
    SOCK_DGRAM = 2,		/* Connectionless, unreliable datagrams
                    of fixed maximum length.  */
    #define SOCK_DGRAM SOCK_DGRAM
    SOCK_RAW = 3,			/* Raw protocol interface.  */
    #define SOCK_RAW SOCK_RAW
    SOCK_RDM = 4,			/* Reliably-delivered messages.  */
    ...

I am torn on the best way to approach a fix here. I have written a PR that replaces

    if sock.type != socket.SOCK_DGRAM:

comparisons with

    if not sock.type & socket.SOCK_DGRAM:

however this perpetuates the enum-vs-bitmask confusion. A more constrained change might be to change

    if sock.type != socket.SOCK_DGRAM:

to

    if not (sock.type == socket.SOCK_DGRAM || sock.type == socket.SOCK_RAW):

but that may not catch every instance of the problem. I'd appreciate some guidance on this.

Cheers,
Travis.

CPython versions tested on:

3.11

Operating systems tested on:

Linux

Linked PRs

@ncoghlan
Copy link
Contributor

ncoghlan commented Feb 2, 2024

Looking at the origin of the underlying regression in #4922 & #76512, it's clear that the intent was to eliminate the bit flag behaviour from the handling of the Python-level socket type information.

Looking at the glibc source code for the declarations of the socket types, and Linux socket man page strongly suggests that bringing back the bitwise checks is NOT the right answer:

  • SOCK_RAW is SOCK_DGRAM | SOCK_STREAM, but from a syscall point of view, it works like SOCK_DGRAM
  • from the description of SOCK_RDM in the man page, it should also be handled like SOCK_DGRAM but doesn't set that bit
  • SOCK_SEQPACKET is SOCK_RDM | SOCK_DGRAM | SOCK_STREAM, but from a syscall point of view, the man page states it works like SOCK_STREAM

Actually handling SOCK_RDM and SOCK_SEQPACKET is well out of scope (since there aren't any standardised Internet protocols that use them), but their existence can still be used to inform the design of an appropriate solution.

For a pure SOCK_RAW bug fix, that sounds to me like leaving checks against SOCK_STREAM alone (since SOCK_RAW failing them is correct, and we're not trying to fix SOCK_SEQPACKET here), but any checks for s.type == SOCK_DGRAM need to change to instead be (s.type == SOCK_DGRAM or s.type == SOCK_RAW)) (edit: or s.type != SOCK_STREAM as GvR suggested below)

Regardless of the solution chosen, it would be very helpful if a basic vcan based compatibility test could be added to the test suite to ensure the standard lib continues to support CAN in addition to TCP and UDP. That should be a separate enhancement ticket though, since it will require a bit of work to figure out the most appropriate way of doing that - vcan may not even be necessary, it may suffice to just test non-CAN raw packets between a Python client and server)

(Full disclosure for other core devs: I used to work with @tjhowse and am hence familiar with the application(s) that are affected here)

@gvanrossum
Copy link
Member

Silly suggestion: since the ValueError is really just meant to prevent users from making trivial mistakes, maybe create_datagram_endpoint() can just check that the passed-in socket is not a SOCK_STREAM socket? Since that test appears solid, only excluding stream sockets would still have the intended benefit of rejecting the (presumably more common) mistake of accidentally passing a stream socket as a datagram endpoint, without preventing advanced users from passing other weird socket types. I have no idea what CAN is, but ISTM that there are more likely additional types of datagram sockets than additional types of stream sockets.

@tjhowse
Copy link
Contributor Author

tjhowse commented Feb 2, 2024

That makes sense to me. This also fixes the problem in our use-case.

gvanrossum pushed a commit that referenced this issue Feb 3, 2024
…endpoint() (#114893)

Also improve exception message.

Co-authored-by: Donghee Na <donghee.na92@gmail.com>
tjhowse added a commit to tjhowse/cpython that referenced this issue Feb 3, 2024
…agram_endpoint() (python#114893)

Also improve exception message.

Co-authored-by: Donghee Na <donghee.na92@gmail.com>
tjhowse added a commit to tjhowse/cpython that referenced this issue Feb 3, 2024
…ate_datagram_endpoint() (pythonGH-114893)

Also improve exception message.

(cherry picked from commit 94ec2b9)

Co-authored-by: Travis Howse <tjhowse@gmail.com>
Co-authored-by: Donghee Na <donghee.na92@gmail.com>
tjhowse added a commit to tjhowse/cpython that referenced this issue Feb 3, 2024
…ate_datagram_endpoint() (pythonGH-114893)

Also improve exception message.

(cherry picked from commit 94ec2b9)

Co-authored-by: Travis Howse <tjhowse@gmail.com>
Co-authored-by: Donghee Na <donghee.na92@gmail.com>
gvanrossum pushed a commit that referenced this issue Feb 4, 2024
#114980)

Also improve exception message.

(cherry picked from commit 94ec2b9)

Co-authored-by: Donghee Na <donghee.na92@gmail.com>
gvanrossum pushed a commit that referenced this issue Feb 5, 2024
#114979)

Also improve exception message.

(cherry picked from commit 94ec2b9)

Co-authored-by: Donghee Na <donghee.na92@gmail.com>
@ncoghlan
Copy link
Contributor

ncoghlan commented Feb 8, 2024

Commit was missing the magic comment to auto-close the issue. Closing since the regression is fixed.

I'll file a new issue for the gap in the test suite (access restrictions on raw sockets make routine testing a bit awkward).

@ncoghlan ncoghlan closed this as completed Feb 8, 2024
@ncoghlan ncoghlan added 3.11 only security fixes 3.12 bugs and security fixes 3.13 new features, bugs and security fixes topic-asyncio labels Feb 8, 2024
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
…agram_endpoint() (python#114893)

Also improve exception message.

Co-authored-by: Donghee Na <donghee.na92@gmail.com>
fsc-eriker pushed a commit to fsc-eriker/cpython that referenced this issue Feb 14, 2024
…agram_endpoint() (python#114893)

Also improve exception message.

Co-authored-by: Donghee Na <donghee.na92@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 only security fixes 3.12 bugs and security fixes 3.13 new features, bugs and security fixes topic-asyncio type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants