Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

respect cgroups limits when trying to allocate memory #86577

Open
caarlos0 mannequin opened this issue Nov 19, 2020 · 12 comments
Open

respect cgroups limits when trying to allocate memory #86577

caarlos0 mannequin opened this issue Nov 19, 2020 · 12 comments
Labels
3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@caarlos0
Copy link
Mannequin

caarlos0 mannequin commented Nov 19, 2020

BPO 42411
Nosy @tiran, @asvetlov

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2020-11-19.17:18:10.358>
labels = ['interpreter-core', '3.8', '3.9', '3.10']
title = 'respect cgroups limits when trying to allocate memory'
updated_at = <Date 2021-11-09.18:08:57.680>
user = 'https://bugs.python.org/caarlos0'

bugs.python.org fields:

activity = <Date 2021-11-09.18:08:57.680>
actor = 'caleb2'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Interpreter Core']
creation = <Date 2020-11-19.17:18:10.358>
creator = 'caarlos0'
dependencies = []
files = []
hgrepos = []
issue_num = 42411
keywords = []
message_count = 11.0
messages = ['381442', '381494', '381495', '381497', '381498', '381499', '381500', '381502', '381504', '382405', '406037']
nosy_count = 4.0
nosy_names = ['christian.heimes', 'asvetlov', 'caarlos0', 'caleb2']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = None
url = 'https://bugs.python.org/issue42411'
versions = ['Python 3.8', 'Python 3.9', 'Python 3.10']

@caarlos0
Copy link
Mannequin Author

caarlos0 mannequin commented Nov 19, 2020

A common use case is running python inside containers, for instance, for training models and things like that.

The python process sees the host memory/cpu, and ignores its limits, which often leads to OOMKills, for instance:

docker run -m 1G --cpus 1 python:rc-alpine python -c 'x = bytearray(80 * 1024 * 1024 * 1000)'

Linux will kill the process once it reaches 1GB of RAM used.

Ideally, we should have an option to make Python try to allocate only the ram its limited to, maybe something similar to Java's +X:UseContainerSupport.

@caarlos0 caarlos0 mannequin added topic-IO 3.7 (EOL) end of life 3.10 only security fixes 3.8 (EOL) end of life 3.9 only security fixes labels Nov 19, 2020
@asvetlov
Copy link
Contributor

Could you explain the proposal?

How "+X:UseContainerSupport" behaves for Java? Sorry, I did not use Java for ages and don't follow the modern Java best practices.

From my understanding, without the Docker the allocation of bytearray(80 * 1024 * 1024 * 1000) leads to raise MemoryError if there is no such memory available and malloc()/callloc returns NULL.

The exception is typically not handled at all but unwinded to "kill the process" behavior.

The reason for this situation is: in Python when you are trying to handle out-of-memory behavior the handler has a very which chance to allocate a Python object under the hood and raise MemoryError at any line of the Python exception handler.

@caarlos0
Copy link
Mannequin Author

caarlos0 mannequin commented Nov 20, 2020

The problem is that, instead of getting a MemoryError, Python tries to "go out of bounds" and allocate more memory than the cgroup allows, causing Linux to kill the process.

A workaround is to set RLIMIT_AS to the contents of /sys/fs/cgroup/memory/memory.limit_in_bytes, which is more or less what Java does when that flag is enabled (there are more things: cgroups v2 has a different path I think).

Setting RLIMIT_AS, we get the MemoryError as expected, instead of a SIGKILL.

My proposal is to either make it the default or hide it behind some sort of flag/environment variable, so users don't need to do that everywhere...

PS: On java, that flag also causes its OS API to return the limits when asked for how much memory is available, instead of returning the host's memory (default behavior).

PS: I'm not an avid Python user, just an ops guy, so I mostly write yaml these days... please let me know if I said doesn't make sense.

Thanks!

@tiran
Copy link
Member

tiran commented Nov 20, 2020

I can neither reproduce the issue with podman and cgroupv2 nor with docker and cgroupsv1. In both cases I'm getting a MemoryError as expected:

# podman run -m 1G --cpus 1 python:rc-alpine python -c 'x = bytearray(80 * 1024 * 1024 * 1000)'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
MemoryError

# docker run -m 1GB fedora:33 python3 -c 'x = bytearray(80 * 1024 * 1024 * 1000)'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
MemoryError

@caarlos0
Copy link
Mannequin Author

caarlos0 mannequin commented Nov 20, 2020

Maybe you're trying to allocate more memory than the host has available? I found out that it gives MemoryError in those cases too (kind of easy to reproduce on docker for mac)...

@caarlos0
Copy link
Mannequin Author

caarlos0 mannequin commented Nov 20, 2020

FWIW, here, both cases:

❯ docker ps -a
CONTAINER ID        IMAGE               COMMAND                  CREATED              STATUS                            PORTS               NAMES
30fc350a8dbd        python:rc-alpine    "python -c 'x = byte…"   24 seconds ago       Exited (137) 11 seconds ago                           great_murdock
5ba46a022910        fedora:33           "python3 -c 'x = byt…"   57 seconds ago       Exited (137) 43 seconds ago                           boring_edison

@tiran
Copy link
Member

tiran commented Nov 20, 2020

I doubt it. My test hosts have between 16G and 64G of RAM + plenty of swap.

What's your platform, distribution, Kernel version, Docker version, and libseccomp version?

@caarlos0
Copy link
Mannequin Author

caarlos0 mannequin commented Nov 20, 2020

Just did more tests here:

**on my machine**:

$ docker run --name test -m 1GB fedora:33 python3 -c 'import resource; m = int(open("/sys/fs/cgroup/memory/memory.limit_in_bytes").read()); resource.setrlimit(resource.RLIMIT_AS, (m, m)); print(resource.getrlimit(resource.RLIMIT_AS)); x = bytearray(4 * 1024 * 1024 * 1000)'; docker inspect test | grep OOMKilled; docker rm test
Traceback (most recent call last):
  File "<string>", line 1, in <module>
MemoryError
(1073741824, 1073741824)
            "OOMKilled": false,
test
$ docker run --name test -m 1GB fedora:33 python3 -c 'x = bytearray(4 * 1024 * 1024 * 1000)'; docker inspect test | grep OOMKilled; docker rm test
            "OOMKilled": true,
test

**on a k8s cluster**:

$ kubectl run -i -t debug --rm --image=fedora:33 --restart=Never --limits='memory=1Gi'
If you don't see a command prompt, try pressing enter.
[root@debug /]# python3
Python 3.9.0 (default, Oct  6 2020, 00:00:00)
[GCC 10.2.1 20200826 (Red Hat 10.2.1-3)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> x = bytearray(4 * 1024 * 1024 * 1000)
Killed
[root@debug /]# python3
Python 3.9.0 (default, Oct  6 2020, 00:00:00)
[GCC 10.2.1 20200826 (Red Hat 10.2.1-3)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import resource
>>> m = int(open("/sys/fs/cgroup/memory/memory.limit_in_bytes").read())
>>> resource.setrlimit(resource.RLIMIT_AS, (m, m))
>>> print(resource.getrlimit(resource.RLIMIT_AS))
(1073741824, 1073741824)
>>> x = bytearray(4 * 1024 * 1024 * 1000)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError
>>>

@tiran
Copy link
Member

tiran commented Nov 20, 2020

Even if we would decide to add a memory limit based on cgroups, there is no way to implement a limit in Python correctly. We rely on the platforms malloc() implementation to handle memory allocation for us.

Python has an abstraction layer for memory allocator, but the allocator only tracks Python objects and does not keep information about the size of slabs. Memory tracking would increase memory usage and decrease performance. It would also not track other memory like 3rd party libraries, extension modules, thread stacks, and other processes in the same cgroups hierarchy.

I'm pretty sure that the RLIMIT_AS approach will not work if you run multiple processes in the same container (e.g. spawn subprocesses).

I'll talk to our glibc and container experts at work next week. Perhaps they are aware of a better way to handle cgroups memory limits more gracefully.

@tiran tiran added interpreter-core (Objects, Python, Grammar, and Parser dirs) and removed topic-IO 3.7 (EOL) end of life labels Nov 20, 2020
@caarlos0
Copy link
Mannequin Author

caarlos0 mannequin commented Dec 3, 2020

Any updates?

@caleb2
Copy link
Mannequin

caleb2 mannequin commented Nov 9, 2021

@christian.heimes following up on this - we have been having frequent memory issues with Python 3.7 in Kubernetes. It could just be the code, but if it does turn out this is a bug then fixing it could be very beneficial.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@tlandschoff-scale
Copy link

@tiran Did you hear anything back about this? We are facing the same issue: Instead of getting an MemoryError in kubernetes (and dealing with it), the pod will crash. After respawning it loads its data again, gets the same request and crashes again.

We are using RLIMIT_AS as a workaround but this requires to know about the memory usage of all processes in the cgroup. Any hint greatly appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs)
Projects
None yet
Development

No branches or pull requests

3 participants