Day 46 — Oh no! A bug :(
15 October 2020 · recurse-center TweetToday I read through some open issues on camelot
, and found a bug for when you install it from conda-forge
. I'd assumed that installing ghostscript
from conda-forge
installs all of its dependencies. It does, but looks like all the those depedencies are statically linked into one gs
executable.
This would've been fine till camelot
ran gs
in a subprocess
call, but the code was changed to use libgs
some time ago. The bug should've been caught when that change was merged, but right now the only test in the conda-forge
recipe is to check if camelot
can be imported. That didn't catch the bug as the error happens when camelot.read_pdf()
is called :(
The fix is to install libgs
using the system package manager (apt
/ brew
), or by downloading the setup for Windows from the ghostscript
website. Hopefully, a fix for this won't be needed after the default pdf to image conversion backend is switched to pdftopng
. But till then I need to update the docs with a note. Huge thanks to Jim Hall for reporting this, and for pointing me in the right direction!
These are the steps I'd used to reproduce the bug initially:
$ sudo apt remove --auto-remove ghostscript
That removed a ton of packages including ubuntu-gnome-desktop
(?!) which I'm supposed to be using! My system still works fine though, need to figure this out later.
After that I create a new conda
environment, installed ghostscript
from conda-forge
, and camelot
from PyPI:
$ conda create --name gs-env python=3.8
$ conda activate gs-env
$ conda install -c conda-forge ghostscript
$ which gs
/home/vinayak/anaconda3/envs/gs-env/bin/gs
$ pip install camelot-py[cv]
And then ran the test Jim described in the issue.
>>> import camelot
>>> tables = camelot.read_pdf('foo.pdf')
>>>
It worked fine and fed into my confirmation bias, until Jim pointed out that I should check which libgs
is being used, and if removing ghostscript
removed libgs
too or not!
>>> from ctypes.util import find_library
>>> find_library("gs")
'libgs.so.9'
>>>
Indeed! libgs
was still present on my system!
$ whereis libgs.so.9
libgs.so: /usr/lib/x86_64-linux-gnu/libgs.so.9
$ apt search libgs
libgs9/focal-updates,focal-security,now 9.50~dfsg-5ubuntu4.2 amd64 [installed]
interpreter for the PostScript language and for PDF - Library
I didn't remove it though as it was going to take away evince
and a lot of other useful packages.
I compared the shared library dependencies of ubuntu
and conda-forge
ghostscript
and found a stark contrast!
$ ldd /usr/bin/gs
linux-vdso.so.1 (0x00007ffc065d0000)
libgs.so.9 => /usr/lib/x86_64-linux-gnu/libgs.so.9 (0x00007efd6bada000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007efd6b8e8000)
libtiff.so.5 => /usr/lib/x86_64-linux-gnu/libtiff.so.5 (0x00007efd6b867000)
libcups.so.2 => /usr/lib/x86_64-linux-gnu/libcups.so.2 (0x00007efd6b7cc000)
libijs-0.35.so => /usr/lib/x86_64-linux-gnu/libijs-0.35.so (0x00007efd6b7c4000)
libpng16.so.16 => /usr/lib/x86_64-linux-gnu/libpng16.so.16 (0x00007efd6b78c000)
libjbig2dec.so.0 => /usr/lib/x86_64-linux-gnu/libjbig2dec.so.0 (0x00007efd6b76d000)
libjpeg.so.8 => /usr/lib/x86_64-linux-gnu/libjpeg.so.8 (0x00007efd6b6e8000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007efd6b6cc000)
liblcms2.so.2 => /usr/lib/x86_64-linux-gnu/liblcms2.so.2 (0x00007efd6b671000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007efd6b522000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007efd6b51c000)
libidn.so.11 => /lib/x86_64-linux-gnu/libidn.so.11 (0x00007efd6b4e5000)
libpaper.so.1 => /usr/lib/x86_64-linux-gnu/libpaper.so.1 (0x00007efd6b4df000)
libfontconfig.so.1 => /usr/lib/x86_64-linux-gnu/libfontconfig.so.1 (0x00007efd6b498000)
libfreetype.so.6 => /usr/lib/x86_64-linux-gnu/libfreetype.so.6 (0x00007efd6b3d9000)
libopenjp2.so.7 => /usr/lib/x86_64-linux-gnu/libopenjp2.so.7 (0x00007efd6b383000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007efd6b360000)
/lib64/ld-linux-x86-64.so.2 (0x00007efd6ca7c000)
libwebp.so.6 => /usr/lib/x86_64-linux-gnu/libwebp.so.6 (0x00007efd6b0f5000)
libzstd.so.1 => /usr/lib/x86_64-linux-gnu/libzstd.so.1 (0x00007efd6b04c000)
liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007efd6b023000)
libjbig.so.0 => /usr/lib/x86_64-linux-gnu/libjbig.so.0 (0x00007efd6ae15000)
libgssapi_krb5.so.2 => /usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2 (0x00007efd6adc8000)
libavahi-common.so.3 => /usr/lib/x86_64-linux-gnu/libavahi-common.so.3 (0x00007efd6adba000)
libavahi-client.so.3 => /usr/lib/x86_64-linux-gnu/libavahi-client.so.3 (0x00007efd6ada5000)
libgnutls.so.30 => /usr/lib/x86_64-linux-gnu/libgnutls.so.30 (0x00007efd6abcf000)
libexpat.so.1 => /lib/x86_64-linux-gnu/libexpat.so.1 (0x00007efd6aba1000)
libuuid.so.1 => /lib/x86_64-linux-gnu/libuuid.so.1 (0x00007efd6ab98000)
libkrb5.so.3 => /usr/lib/x86_64-linux-gnu/libkrb5.so.3 (0x00007efd6aabb000)
libk5crypto.so.3 => /usr/lib/x86_64-linux-gnu/libk5crypto.so.3 (0x00007efd6aa88000)
libcom_err.so.2 => /lib/x86_64-linux-gnu/libcom_err.so.2 (0x00007efd6aa81000)
libkrb5support.so.0 => /usr/lib/x86_64-linux-gnu/libkrb5support.so.0 (0x00007efd6aa72000)
libdbus-1.so.3 => /lib/x86_64-linux-gnu/libdbus-1.so.3 (0x00007efd6aa21000)
libp11-kit.so.0 => /usr/lib/x86_64-linux-gnu/libp11-kit.so.0 (0x00007efd6a8eb000)
libidn2.so.0 => /usr/lib/x86_64-linux-gnu/libidn2.so.0 (0x00007efd6a8ca000)
libunistring.so.2 => /usr/lib/x86_64-linux-gnu/libunistring.so.2 (0x00007efd6a746000)
libtasn1.so.6 => /usr/lib/x86_64-linux-gnu/libtasn1.so.6 (0x00007efd6a730000)
libnettle.so.7 => /usr/lib/x86_64-linux-gnu/libnettle.so.7 (0x00007efd6a6f6000)
libhogweed.so.5 => /usr/lib/x86_64-linux-gnu/libhogweed.so.5 (0x00007efd6a6be000)
libgmp.so.10 => /usr/lib/x86_64-linux-gnu/libgmp.so.10 (0x00007efd6a63a000)
libkeyutils.so.1 => /lib/x86_64-linux-gnu/libkeyutils.so.1 (0x00007efd6a633000)
libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x00007efd6a615000)
libsystemd.so.0 => /lib/x86_64-linux-gnu/libsystemd.so.0 (0x00007efd6a568000)
libffi.so.7 => /usr/lib/x86_64-linux-gnu/libffi.so.7 (0x00007efd6a55c000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007efd6a551000)
liblz4.so.1 => /usr/lib/x86_64-linux-gnu/liblz4.so.1 (0x00007efd6a530000)
libgcrypt.so.20 => /usr/lib/x86_64-linux-gnu/libgcrypt.so.20 (0x00007efd6a410000)
libgpg-error.so.0 => /lib/x86_64-linux-gnu/libgpg-error.so.0 (0x00007efd6a3ed000)
ubuntu
ghostscript
has so many shared library dependencies, but conda-forge
ghostscript
does not:
$ ldd /home/vinayak/anaconda3/envs/gs-3.8/bin/gs
linux-vdso.so.1 (0x00007ffc8dba3000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f33bd3e3000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f33bd3c0000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f33bd1ce000)
/lib64/ld-linux-x86-64.so.2 (0x00007f33becb9000)
It's possible that conda-forge
ghostscript
is one statically linked executable.
There's also a stark difference in the sizes for both executables:
$ du -sh /usr/bin/gs
16K /usr/bin/gs
$ du -sh /home/vinayak/anaconda3/envs/gs-3.8/bin/gs
25M /home/vinayak/anaconda3/envs/gs-3.8/bin/gs
To reproduce the bug in a clean environment, I launched a docker
container with the latest ubuntu
image, and installed all the requirements:
$ docker run -it ubuntu /bin/bash
$ apt update && apt install curl git
$ curl -O https://repo.anaconda.com/archive/Anaconda3-2019.03-Linux-x86_64.sh
$ bash Anaconda3-2019.03-Linux-x86_64.sh
$ eval "$(/root/anaconda3/bin/conda shell.bash hook)"
(base) $ conda create --name gs-env python=3.8
(base) $ conda activate gs-env
(gs-env) $ conda install -c conda-forge camelot-py
Installing camelot
from conda-forge installs ghostscript
. But I couldn't find libgs
!
(gs-env) python3
>>> from ctypes.util import find_library
>>> find_library("gs")
>>>
(gs-env) which gs
/root/anaconda3/envs/gs-env/bin/gs
(gs-env) whereis libgs
libgs:
After that I tried to run the code snippet Jim had posted:
(gs-env) $ git clone https://github.com/camelot-dev/camelot
(gs-env) $ cd camelot/tests/files
(gs-env) ./camelot/tests/files $ python3
>>> import camelot
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/root/anaconda3/envs/gs-env/lib/python3.8/site-packages/camelot/__init__.py", line 6, in <module>
from .io import read_pdf
File "/root/anaconda3/envs/gs-env/lib/python3.8/site-packages/camelot/io.py", line 5, in <module>
from .handlers import PDFHandler
File "/root/anaconda3/envs/gs-env/lib/python3.8/site-packages/camelot/handlers.py", line 9, in <module>
from .parsers import Stream, Lattice
File "/root/anaconda3/envs/gs-env/lib/python3.8/site-packages/camelot/parsers/__init__.py", line 4, in <module>
from .lattice import Lattice
File "/root/anaconda3/envs/gs-env/lib/python3.8/site-packages/camelot/parsers/lattice.py", line 26, in <module>
from ..image_processing import (
File "/root/anaconda3/envs/gs-env/lib/python3.8/site-packages/camelot/image_processing.py", line 3, in <module>
import cv2
ImportError: libGL.so.1: cannot open shared object file: No such file or directory
>>>
But ran into another bug! opencv
depends on libGL.so
, which was not already there on this base ubuntu
image, and I had to install libgl1-mesa-glx
to fix this opencv
import error.
(gs-env) ./camelot/tests/files $ python3
>>> import camelot
>>> tables = camelot.read_pdf('foo.pdf')
Traceback (most recent call last):
File "/root/anaconda3/envs/gs-env/lib/python3.8/site-packages/camelot/ext/ghostscript/_gsprint.py", line 260, in <module>
libgs = cdll.LoadLibrary("libgs.so")
File "/root/anaconda3/envs/gs-env/lib/python3.8/ctypes/__init__.py", line 451, in LoadLibrary
return self._dlltype(name)
File "/root/anaconda3/envs/gs-env/lib/python3.8/ctypes/__init__.py", line 373, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libgs.so: cannot open shared object file: No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/root/anaconda3/envs/gs-env/lib/python3.8/site-packages/camelot/io.py", line 113, in read_pdf
tables = p.parse(
File "/root/anaconda3/envs/gs-env/lib/python3.8/site-packages/camelot/handlers.py", line 171, in parse
t = parser.extract_tables(
File "/root/anaconda3/envs/gs-env/lib/python3.8/site-packages/camelot/parsers/lattice.py", line 402, in extract_tables
self._generate_image()
File "/root/anaconda3/envs/gs-env/lib/python3.8/site-packages/camelot/parsers/lattice.py", line 211, in _generate_image
from ..ext.ghostscript import Ghostscript
File "/root/anaconda3/envs/gs-env/lib/python3.8/site-packages/camelot/ext/ghostscript/__init__.py", line 24, in <module>
from . import _gsprint as gs
File "/root/anaconda3/envs/gs-env/lib/python3.8/site-packages/camelot/ext/ghostscript/_gsprint.py", line 267, in <module>
raise RuntimeError("Please make sure that Ghostscript is installed")
RuntimeError: Please make sure that Ghostscript is installed
>>>
Finally the bug that I was looking for! Installing libgs9
fixed it, but this is not ideal. I need to come up with a Windows wheel for pdftopng so that I can finally replace ghostscript
as the default pdf to image conversion backend in camelot
. Is there a way to somehow launch "Windows containers" to debug things?