Day 51 — Bundling DLLs with Windows wheels (the package_data way)
21 October 2020 · recurse-center TweetLast week, I was able to build pdftopng on Windows, but the extension worked only when all the DLLs it depended on were kept in the same directory as the PYD (which is basically a DLL).
At the end of that post, I had some questions around bundling DLLs with Windows wheels, so today I looked at some discuss.python.org and StackOverflow posts; and inside some Windows wheels published by Christoph Gohlke to find the answers to those questions!
DLL Search Order on Windows
The system does not search for a DLL:
- If a DLL with the same name is already loaded in memory.
- If the DLL is on the list of known DLLs for the current Windows version, in which case the system loads its copy of that DLL.
If none of those conditions are met, then it continues to search for the DLL in the following order:
- The directory from which the application was loaded. (Does this mean that if you launch a game from an icon on your Desktop, then
C:\Users\Vinayak\Desktop
is the directory from which your application was loaded?) - The system directory. (
C:\Windows\System32
) - The 16-bit system directory. (??)
- The Windows directory. (
C:\Windows
) - The current directory. (So this is why my compiled extension worked!)
- The directories that are listed in the PATH environment variable. (I had also put the DLL directory on the
PATH
though, so why didn't the loader find those DLLs?)
How do I bundle DLLs with Windows wheels?
I found that this is not a new question. There have been multiple posts asking about an auditwheel
/ delocate
alternative for Windows on discuss.python.org.
Steve Dower mentioned that adding a DLL as package_data
to go alongside the extension module that requires it is sufficient. In Python 3.8, he has also added the os.add_dll_directory()
if you prefer keeping the DLLs in a separate folder.
Nathaniel J. Smith made a lot of valid points about renaming the DLLs and modifying their import table so as to avoid DLL Hell but I decided to ignore those to take a first stab at bundling DLLs using the package_data
way that Steve suggested.
How do others bundle DLLs with Windows wheels?
I also found this nice website with a lot of Windows wheels published by Christoph Gohlke, and decided to look inside Windows wheels for some familiar projects.
numpy
$ unzip -l numpy-1.19.2-cp38-cp38-win_amd64.whl
Archive: numpy-1.19.2-cp38-cp38-win_amd64.whl
Length Date Time Name
--------- ---------- ----- ----
11555 2020-09-10 01:28 numpy/__init__.py
...
32939993 2020-09-10 01:30 numpy/.libs/libopenblas.NOIJJG62EMASZI6NYURL6JBKM4EVBGM7.gfortran-win_amd64.dll
...
The numpy
Windows wheel contains one DLL file for OpenBLAS
! I couldn't find how the DLL is generated in the numpy
codebase, and I also need to find how it gets into the Windows wheel.
Update: Looks like the DLL is copied into numpy before the wheel is built so that everything that requires the DLL can link against it. It also looks like it happens in the Azure Steps for Windows where setup_openblas() is called to (1) download the DLL from the openblas-libs package on Anaconda, and (2) place it in a directory where it can be found.
numpy
has a from . import _distributor_init
import statement in its __init__.py
to "allow distributors to run custom init code". (Are these package distributors for various operating systems and do they create their own _distributor_init.py
to load things differently?)
Update: Looks like a _distributor_init.py
(to load the OpenBLAS DLL) is placed alongside the __init__.py
when a Windows wheel is built. This _distributor_init.py
wasn't in the numpy-wheels
repo before, but moved there after Christoph Gohlke raised a concern about it being distribution specific. The code is also duplicated in the numpy
repo for some reason. Though in order to answer the question from above, I need to look for other examples of _distributor_init.py
files for numpy
.
The code inside _distributor_init.py
finds the OpenBLAS
DLL relative to the __init__.py
and loads it using ctypes.WinDLL
:
if os.name == 'nt':
# convention for storing / loading the DLL from
# numpy/.libs/, if present
try:
from ctypes import WinDLL
basedir = os.path.dirname(__file__)
except:
pass
else:
libs_dir = os.path.abspath(os.path.join(basedir, '.libs'))
DLL_filenames = []
if os.path.isdir(libs_dir):
for filename in glob.glob(os.path.join(libs_dir,
'*openblas*dll')):
# NOTE: would it change behavior to load ALL
# DLLs at this path vs. the name restriction?
WinDLL(os.path.abspath(filename))
DLL_filenames.append(filename)
if len(DLL_filenames) > 1:
import warnings
warnings.warn("loaded more than 1 DLL from .libs:\n%s" %
"\n".join(DLL_filenames),
stacklevel=1)
pyarrow
$ unzip -l pyarrow-1.0.1-cp38-cp38-win_amd64.whl | grep dll
8459264 2020-08-17 19:35 pyarrow/arrow.dll
910336 2020-08-17 19:35 pyarrow/arrow_dataset.dll
2610176 2020-08-17 19:35 pyarrow/arrow_flight.dll
1264640 2020-08-17 19:35 pyarrow/arrow_python.dll
91648 2020-08-17 19:35 pyarrow/arrow_python_flight.dll
81920 2020-08-17 19:35 pyarrow/cares.dll
3249664 2020-08-17 19:35 pyarrow/libcrypto-1_1-x64.dll
2661888 2020-08-17 19:35 pyarrow/libprotobuf.dll
651264 2020-08-17 19:35 pyarrow/libssl-1_1-x64.dll
2204672 2020-08-17 19:35 pyarrow/parquet.dll
89600 2020-08-17 19:35 pyarrow/zlib.dll
The pyarrow
Windows wheel contains a lot of DLLs! But they aren't loaded using the ctypes
module. They are instead kept in the same directory as the compiled PYD files, which have the DLL names hardcoded in their import tables:
$ objdump -x lib.cp38-win_amd64.pyd | grep dll
DLL Name: python38.dll
DLL Name: arrow.dll
DLL Name: arrow_python.dll
DLL Name: MSVCP140.dll
DLL Name: VCRUNTIME140.dll
DLL Name: api-ms-win-crt-runtime-l1-1-0.dll
DLL Name: api-ms-win-crt-heap-l1-1-0.dll
DLL Name: KERNEL32.dll
pygame
$ unzip -l pygame-1.9.6-cp39-cp39-win_amd64.whl
Archive: pygame-1.9.6-cp39-cp39-win_amd64.whl
Length Date Time Name
--------- ---------- ----- ----
300544 2019-08-14 20:26 pygame/SDL.dll
...
12193 2019-06-08 01:52 pygame/__init__.py
...
679424 2020-05-24 17:50 pygame/_freetype.cp39-win_amd64.pyd
...
84992 2019-02-14 06:20 pygame/zlib.dll
...
The pygame
Windows wheel contains a lot of DLLs and PYDs! And it doesn't use the ctypes
module in its __init__.py
to load DLLs. Instead it puts the DLL directory on the search path using the os.add_dll_directory
function. In addition to that, it also modifies the PATH
variable for the process in which the application is loaded:
if os.name == 'nt':
pygame_dir = None
try:
# add pygame folder to Windows DLL search paths
pygame_dir = os.path.abspath(os.path.dirname(__file__))
try:
os.add_dll_directory(pygame_dir)
except Exception:
pass
os.environ['PATH'] = pygame_dir + ';' + os.environ['PATH']
except Exception:
pass
del pygame_dir
shapely
$ unzip -l Shapely-1.7.1-cp39-cp39-win_amd64.whl
Archive: Shapely-1.7.1-cp39-cp39-win_amd64.whl
Length Date Time Name
--------- ---------- ----- ----
22 2020-08-20 20:01 shapely/__init__.py
30675 2020-08-22 17:44 shapely/geos.py
815616 2020-08-22 17:50 shapely/DLLs/geos_c.dll
...
The shapely
Windows wheel contains one DLL. And its loaded inside the geos.py
file using the ctpyes.CDLL
function:
elif sys.platform == 'win32':
try:
_lgeos = CDLL(os.path.abspath(os.path.join(
os.path.dirname(__file__), "DLLs", "geos_c.dll")))
except Exception:
_lgeos = CDLL("geos_c.dll")
def free(m):
return
This block is part of a large if
statement with custom logic to load shared libraries for different operating systems!
pycairo
$ unzip -l pycairo-1.20.0-cp39-cp39-win_amd64.whl
Archive: pycairo-1.20.0-cp39-cp39-win_amd64.whl
Length Date Time Name
--------- ---------- ----- ----
660 2019-08-24 21:37 cairo/__init__.py
33334 2020-01-19 09:57 cairo/__init__.pyi
179712 2020-10-05 19:35 cairo/_cairo.cp39-win_amd64.pyd
2199552 2020-10-05 19:35 cairo/cairo.dll
...
The pycairo
Windows wheel has just one DLL, and it isn't loaded using the ctypes
module. Similar to pyarrow
, it is kept in the same directory as the compiled PYD file, which has the DLL name hardcoded in its import table:
$ objdump -x cairo/_cairo.cp39-win_amd64.pyd | grep dll
DLL Name: cairo.dll
DLL Name: python39.dll
DLL Name: KERNEL32.dll
DLL Name: VCRUNTIME140.dll
DLL Name: api-ms-win-crt-runtime-l1-1-0.dll
I looked at a lot of other wheels (and their setup.py
files) too, but I'll not mention all of them to keep the post short. At this point, a pattern started to emerge. They were following one of the following ways to bundle DLLs:
- Using
package_data
inside theirsetup.py
- Specifying the DLL directory in a
MANIFEST.in
Some of them (like pyarrow
) weren't doing any of the above. I suspect they have custom code somewhere (which I need to find) to unpack the built wheel, copy over the DLL, and then zip all the files again.
I went with the package_data
way that Steve suggested because a lot of projects seemed to follow that.
How to bundle DLLs with Windows wheels (the package_data
way)
I found that vcpkg
(which I used on my local machine to install external dependencies for building poppler
) is already installed on the windows-latest
virtual environment on GitHub actions!
I did a GitHub search to see if people are using vcpkg
with cibuildwheel
and found 3 results! While looking at the cibuildwheel
config for this project, I learned about the VCPKG_INSTALLATION_ROOT
environment variable, which meant that I could:
- Use it to specify the path to
vcpkg.cmake
so thatcmake
can find all the C/C++ projects (that I install in that isolated environment) - Also use it copy over the DLLs for all the C/C++ projects (that I install in that isolated environment) into my project directory!
I created a build_win.bat
to go along with the build scripts for Linux and macOS, in which I install all the external requirements using vcpkg
, and then build poppler
:
@echo off
vcpkg install freetype:x64-windows fontconfig:x64-windows libpng:x64-windows libjpeg-turbo:x64-windows
Rem set PATH=%PATH%;.\vcpkg\installed\x64-windows\bin
cd lib\poppler
mkdir build && cd build
cmake -DCMAKE_TOOLCHAIN_FILE=%VCPKG_INSTALLATION_ROOT%/scripts/buildsystems/vcpkg.cmake -DENABLE_QT5=OFF -DENABLE_LIBOPENJPEG=none -DENABLE_CPP=OFF ..
cmake --build . --config Release --target poppler
And then I added a copy_dlls
function to my setup.py
to copy over all the DLLs from the VCPKG_INSTALLATION_ROOT
to src/pdftopng
where the PYD is generated:
def copy_dlls():
vcpkg_bin_dir = os.path.join(os.environ["VCPKG_INSTALLATION_ROOT"], "installed", "x64-windows", "bin")
for file in glob.glob(os.path.join(vcpkg_bin_dir, "*.dll")):
shutil.copy(file, os.path.join("src", "pdftopng"))
And finally called copy_dlls
(while also updating the package_data to include all DLLs) when the setup.py
is run on Windows:
package_data = {}
if sys.platform == 'win32':
copy_dlls()
package_data = {'pdftopng': ['*.dll']}
setup(**metadata)
The build ran successfully and generated a Windows wheel with all the DLLs bundled within!
$ unzip -l pdftopng-0.1.0-cp38-cp38-win_amd64.whl
Archive: pdftopng-0.1.0-cp38-cp38-win_amd64.whl
Length Date Time Name
--------- ---------- ----- ----
65 2020-10-20 01:24 pdftopng/__init__.py
731 2020-10-20 01:24 pdftopng/__version__.py
137216 2020-10-20 01:32 pdftopng/brotlicommon.dll
47104 2020-10-20 01:32 pdftopng/brotlidec.dll
3082240 2020-10-20 01:32 pdftopng/brotlienc.dll
74752 2020-10-20 01:32 pdftopng/bz2.dll
268800 2020-10-20 01:32 pdftopng/fontconfig.dll
657408 2020-10-20 01:32 pdftopng/freetype.dll
550912 2020-10-20 01:32 pdftopng/jpeg62.dll
10752 2020-10-20 01:32 pdftopng/libcharset.dll
138752 2020-10-20 01:32 pdftopng/libexpat.dll
936960 2020-10-20 01:32 pdftopng/libiconv.dll
195072 2020-10-20 01:32 pdftopng/libpng16.dll
1990656 2020-10-20 01:32 pdftopng/pdftopng.cp38-win_amd64.pyd
27943 2020-10-20 01:24 pdftopng/pdftopng.cpp
614400 2020-10-20 01:32 pdftopng/turbojpeg.dll
85504 2020-10-20 01:32 pdftopng/zlib1.dll
18431 2020-10-20 01:32 pdftopng-0.1.0.dist-info/LICENSE
1181 2020-10-20 01:32 pdftopng-0.1.0.dist-info/METADATA
105 2020-10-20 01:32 pdftopng-0.1.0.dist-info/WHEEL
9 2020-10-20 01:32 pdftopng-0.1.0.dist-info/top_level.txt
1759 2020-10-20 01:32 pdftopng-0.1.0.dist-info/RECORD
--------- -------
8840752 22 files
Some other things I learned while trying to make the build succeed:
- Calling one batch program from another using
call
- The
/MD
and/MT
compiler options: I had to add/MD
to themsvc
compiler options in the setup.py because the build failed with Linker Tools Error LNK2038 due to symbol mismatches! It was all running fine on my machine though :(