-
-
Notifications
You must be signed in to change notification settings - Fork 34.2k
Description
Bug report
Bug description:
Searching this error the only reference I find was a previous conversation where this issue was considered theoretical and not reproducible.
I made a small script which can reproduce the error.
It's really frustrating to do, because the error isn't caught by the debugger, so it's really hard to figure out what was triggering the error, since the Traceback is seemingly random. I slowly pruned my code away to leave this minimal case.
The main code iterates through a PDF file, implemented as an iterator that returns the next page of the PDF (via poppler) as an image at a given DPI. For each page, a Future is created with ProcessPoolExecutor which delays and returns.
Both the iterator through poppler AND the futures are required.
I have solved the problem in my code by removing the iterator, but spent the time to create this reproduction in case it helps someone track down this bug.
Prerequisite: https://pdf2image.readthedocs.io/en/latest/installation.html
import functools
import threading
import time
from pdf2image import convert_from_path, pdfinfo_from_path
from concurrent.futures import ProcessPoolExecutor
class Pdf2ImageIterator:
def __init__(self, pdf_path: str, poppler_path: str, dpi_high: int):
self._path = pdf_path
self._dpi_high = dpi_high
self._poppler_path = poppler_path
self.count = pdfinfo_from_path(self._path, None, None, poppler_path=self._poppler_path)["Pages"]
def __iter__(self):
self._page = 1
return self
def __next__(self):
try:
page_h = convert_from_path(self._path, poppler_path=self._poppler_path, dpi=self._dpi_high,
first_page=self._page, last_page=self._page)
self._page += 1
return page_h
except Exception as e:
print(f"Pdf2Images {e}")
def my_done_callback(i, image, future):
print(f"my_done_callback on {threading.get_ident()}")
def my_future(i, image):
print(f"my_future on {threading.get_ident()}")
time.sleep(1)
return 0
if __name__ == '__main__':
ppe = ProcessPoolExecutor(None)
poppler_path = r"C:\U<YOUR-PATH>\poppler-23.01.0\Library\bin"
pdf = r"<PDF file with multiple pages>"
pages = Pdf2ImageIterator(pdf, poppler_path, 200)
for i, im in enumerate(pages):
future = ppe.submit(my_future, i, im)
future.add_done_callback(functools.partial(my_done_callback, i, im))
print("Done")Example console logs:
Exception ignored in tp_clear of: <class 'memoryview'>
Traceback (most recent call last):
File "...\plugins\python-ce\helpers\pydev\pydevd_tracing.py", line 56, in _internal_set_trace
filename = frame.f_back.f_code.co_filename.lower()
BufferError: memoryview has 1 exported buffer
Exception ignored in tp_clear of: <class 'memoryview'>
Traceback (most recent call last):
File "...Programs\Python\Python310\lib\threading.py", line 568, in set
with self._cond:
BufferError: memoryview has 1 exported buffer
Exception ignored in tp_clear of: <class 'memoryview'>
Traceback (most recent call last):
File "...\plugins\python-ce\helpers\pydev\pydevd_tracing.py", line 56, in _internal_set_trace
filename = frame.f_back.f_code.co_filename.lower()
BufferError: memoryview has 1 exported bufferPython 3.10.4 (tags/v3.10.4:9d38120, Mar 23 2022, 23:13:41) [MSC v.1929 64 bit (AMD64)] on win32
CPython versions tested on:
3.10
Operating systems tested on:
Windows