Day 51 — Bundling DLLs with Windows wheels (the package_data way)

Last week, I was able to build pdftopng on Windows, but the extension worked only when all the DLLs it depended on were kept in the same directory as the PYD (which is basically a DLL).

At the end of that post, I had some questions around bundling DLLs with Windows wheels, so today I looked at some discuss.python.org and StackOverflow posts; and inside some Windows wheels published by Christoph Gohlke to find the answers to those questions!

DLL Search Order on Windows

The system does not search for a DLL:

If none of those conditions are met, then it continues to search for the DLL in the following order:

How do I bundle DLLs with Windows wheels?

I found that this is not a new question. There have been multiple posts asking about an auditwheel / delocate alternative for Windows on discuss.python.org.

Steve Dower mentioned that adding a DLL as package_data to go alongside the extension module that requires it is sufficient. In Python 3.8, he has also added the os.add_dll_directory() if you prefer keeping the DLLs in a separate folder.

Nathaniel J. Smith made a lot of valid points about renaming the DLLs and modifying their import table so as to avoid DLL Hell but I decided to ignore those to take a first stab at bundling DLLs using the package_data way that Steve suggested.

How do others bundle DLLs with Windows wheels?

I also found this nice website with a lot of Windows wheels published by Christoph Gohlke, and decided to look inside Windows wheels for some familiar projects.

numpy


  $ unzip -l numpy-1.19.2-cp38-cp38-win_amd64.whl
  Archive:  numpy-1.19.2-cp38-cp38-win_amd64.whl
    Length      Date    Time    Name
  ---------  ---------- -----   ----
      11555  2020-09-10 01:28   numpy/__init__.py
                                ...
   32939993  2020-09-10 01:30   numpy/.libs/libopenblas.NOIJJG62EMASZI6NYURL6JBKM4EVBGM7.gfortran-win_amd64.dll
                                ...

The numpy Windows wheel contains one DLL file for OpenBLAS! I couldn't find how the DLL is generated in the numpy codebase, and I also need to find how it gets into the Windows wheel.

Update: Looks like the DLL is copied into numpy before the wheel is built so that everything that requires the DLL can link against it. It also looks like it happens in the Azure Steps for Windows where setup_openblas() is called to (1) download the DLL from the openblas-libs package on Anaconda, and (2) place it in a directory where it can be found.

numpy has a from . import _distributor_init import statement in its __init__.py to "allow distributors to run custom init code". (Are these package distributors for various operating systems and do they create their own _distributor_init.py to load things differently?)

Update: Looks like a _distributor_init.py (to load the OpenBLAS DLL) is placed alongside the __init__.py when a Windows wheel is built. This _distributor_init.py wasn't in the numpy-wheels repo before, but moved there after Christoph Gohlke raised a concern about it being distribution specific. The code is also duplicated in the numpy repo for some reason. Though in order to answer the question from above, I need to look for other examples of _distributor_init.py files for numpy.

The code inside _distributor_init.py finds the OpenBLAS DLL relative to the __init__.py and loads it using ctypes.WinDLL:


  if os.name == 'nt':
      # convention for storing / loading the DLL from
      # numpy/.libs/, if present
      try:
          from ctypes import WinDLL
          basedir = os.path.dirname(__file__)
      except:
          pass
      else:
          libs_dir = os.path.abspath(os.path.join(basedir, '.libs'))
          DLL_filenames = []
          if os.path.isdir(libs_dir):
              for filename in glob.glob(os.path.join(libs_dir,
                                                     '*openblas*dll')):
                  # NOTE: would it change behavior to load ALL
                  # DLLs at this path vs. the name restriction?
                  WinDLL(os.path.abspath(filename))
                  DLL_filenames.append(filename)
          if len(DLL_filenames) > 1:
              import warnings
              warnings.warn("loaded more than 1 DLL from .libs:\n%s" %
                            "\n".join(DLL_filenames),
                            stacklevel=1)

pyarrow


  $ unzip -l pyarrow-1.0.1-cp38-cp38-win_amd64.whl | grep dll
    8459264  2020-08-17 19:35   pyarrow/arrow.dll
     910336  2020-08-17 19:35   pyarrow/arrow_dataset.dll
    2610176  2020-08-17 19:35   pyarrow/arrow_flight.dll
    1264640  2020-08-17 19:35   pyarrow/arrow_python.dll
      91648  2020-08-17 19:35   pyarrow/arrow_python_flight.dll
      81920  2020-08-17 19:35   pyarrow/cares.dll
    3249664  2020-08-17 19:35   pyarrow/libcrypto-1_1-x64.dll
    2661888  2020-08-17 19:35   pyarrow/libprotobuf.dll
     651264  2020-08-17 19:35   pyarrow/libssl-1_1-x64.dll
    2204672  2020-08-17 19:35   pyarrow/parquet.dll
      89600  2020-08-17 19:35   pyarrow/zlib.dll

The pyarrow Windows wheel contains a lot of DLLs! But they aren't loaded using the ctypes module. They are instead kept in the same directory as the compiled PYD files, which have the DLL names hardcoded in their import tables:


  $ objdump -x lib.cp38-win_amd64.pyd | grep dll
    DLL Name: python38.dll
    DLL Name: arrow.dll
    DLL Name: arrow_python.dll
    DLL Name: MSVCP140.dll
    DLL Name: VCRUNTIME140.dll
    DLL Name: api-ms-win-crt-runtime-l1-1-0.dll
    DLL Name: api-ms-win-crt-heap-l1-1-0.dll
    DLL Name: KERNEL32.dll

pygame


  $ unzip -l pygame-1.9.6-cp39-cp39-win_amd64.whl
  Archive:  pygame-1.9.6-cp39-cp39-win_amd64.whl
    Length      Date    Time    Name
  ---------  ---------- -----   ----
     300544  2019-08-14 20:26   pygame/SDL.dll
                                ...
      12193  2019-06-08 01:52   pygame/__init__.py
                                ...
     679424  2020-05-24 17:50   pygame/_freetype.cp39-win_amd64.pyd
                                ...
      84992  2019-02-14 06:20   pygame/zlib.dll
                                ...

The pygame Windows wheel contains a lot of DLLs and PYDs! And it doesn't use the ctypes module in its __init__.py to load DLLs. Instead it puts the DLL directory on the search path using the os.add_dll_directory function. In addition to that, it also modifies the PATH variable for the process in which the application is loaded:


  if os.name == 'nt':
      pygame_dir = None
      try:
          # add pygame folder to Windows DLL search paths
          pygame_dir = os.path.abspath(os.path.dirname(__file__))
          try:
              os.add_dll_directory(pygame_dir)
          except Exception:
              pass
          os.environ['PATH'] = pygame_dir + ';' + os.environ['PATH']
      except Exception:
          pass
      del pygame_dir

shapely


  $ unzip -l Shapely-1.7.1-cp39-cp39-win_amd64.whl
  Archive:  Shapely-1.7.1-cp39-cp39-win_amd64.whl
    Length      Date    Time    Name
  ---------  ---------- -----   ----
         22  2020-08-20 20:01   shapely/__init__.py
      30675  2020-08-22 17:44   shapely/geos.py
     815616  2020-08-22 17:50   shapely/DLLs/geos_c.dll
                                ...

The shapely Windows wheel contains one DLL. And its loaded inside the geos.py file using the ctpyes.CDLL function:


  elif sys.platform == 'win32':
      try:
          _lgeos = CDLL(os.path.abspath(os.path.join(
          os.path.dirname(__file__), "DLLs", "geos_c.dll")))
      except Exception:
          _lgeos = CDLL("geos_c.dll")

      def free(m):
          return

This block is part of a large if statement with custom logic to load shared libraries for different operating systems!

pycairo


  $ unzip -l pycairo-1.20.0-cp39-cp39-win_amd64.whl
  Archive:  pycairo-1.20.0-cp39-cp39-win_amd64.whl
    Length      Date    Time    Name
  ---------  ---------- -----   ----
        660  2019-08-24 21:37   cairo/__init__.py
      33334  2020-01-19 09:57   cairo/__init__.pyi
     179712  2020-10-05 19:35   cairo/_cairo.cp39-win_amd64.pyd
    2199552  2020-10-05 19:35   cairo/cairo.dll
                                ...

The pycairo Windows wheel has just one DLL, and it isn't loaded using the ctypes module. Similar to pyarrow, it is kept in the same directory as the compiled PYD file, which has the DLL name hardcoded in its import table:


  $ objdump -x cairo/_cairo.cp39-win_amd64.pyd | grep dll
    DLL Name: cairo.dll
    DLL Name: python39.dll
    DLL Name: KERNEL32.dll
    DLL Name: VCRUNTIME140.dll
    DLL Name: api-ms-win-crt-runtime-l1-1-0.dll


I looked at a lot of other wheels (and their setup.py files) too, but I'll not mention all of them to keep the post short. At this point, a pattern started to emerge. They were following one of the following ways to bundle DLLs:

Some of them (like pyarrow) weren't doing any of the above. I suspect they have custom code somewhere (which I need to find) to unpack the built wheel, copy over the DLL, and then zip all the files again.

I went with the package_data way that Steve suggested because a lot of projects seemed to follow that.

How to bundle DLLs with Windows wheels (the package_data way)

I found that vcpkg (which I used on my local machine to install external dependencies for building poppler) is already installed on the windows-latest virtual environment on GitHub actions!

I did a GitHub search to see if people are using vcpkg with cibuildwheel and found 3 results! While looking at the cibuildwheel config for this project, I learned about the VCPKG_INSTALLATION_ROOT environment variable, which meant that I could:

I created a build_win.bat to go along with the build scripts for Linux and macOS, in which I install all the external requirements using vcpkg, and then build poppler:


  @echo off

  vcpkg install freetype:x64-windows fontconfig:x64-windows libpng:x64-windows libjpeg-turbo:x64-windows
  Rem set PATH=%PATH%;.\vcpkg\installed\x64-windows\bin

  cd lib\poppler
  mkdir build && cd build
  cmake -DCMAKE_TOOLCHAIN_FILE=%VCPKG_INSTALLATION_ROOT%/scripts/buildsystems/vcpkg.cmake  -DENABLE_QT5=OFF -DENABLE_LIBOPENJPEG=none -DENABLE_CPP=OFF ..
  cmake --build . --config Release --target poppler

And then I added a copy_dlls function to my setup.py to copy over all the DLLs from the VCPKG_INSTALLATION_ROOT to src/pdftopng where the PYD is generated:


  def copy_dlls():
      vcpkg_bin_dir = os.path.join(os.environ["VCPKG_INSTALLATION_ROOT"], "installed", "x64-windows", "bin")
      for file in glob.glob(os.path.join(vcpkg_bin_dir, "*.dll")):
          shutil.copy(file, os.path.join("src", "pdftopng"))

And finally called copy_dlls (while also updating the package_data to include all DLLs) when the setup.py is run on Windows:


  package_data = {}
  if sys.platform == 'win32':
      copy_dlls()
      package_data = {'pdftopng': ['*.dll']}

  setup(**metadata)

The build ran successfully and generated a Windows wheel with all the DLLs bundled within!


  $ unzip -l pdftopng-0.1.0-cp38-cp38-win_amd64.whl
    Archive:  pdftopng-0.1.0-cp38-cp38-win_amd64.whl
      Length      Date    Time    Name
    ---------  ---------- -----   ----
           65  2020-10-20 01:24   pdftopng/__init__.py
          731  2020-10-20 01:24   pdftopng/__version__.py
       137216  2020-10-20 01:32   pdftopng/brotlicommon.dll
        47104  2020-10-20 01:32   pdftopng/brotlidec.dll
      3082240  2020-10-20 01:32   pdftopng/brotlienc.dll
        74752  2020-10-20 01:32   pdftopng/bz2.dll
       268800  2020-10-20 01:32   pdftopng/fontconfig.dll
       657408  2020-10-20 01:32   pdftopng/freetype.dll
       550912  2020-10-20 01:32   pdftopng/jpeg62.dll
        10752  2020-10-20 01:32   pdftopng/libcharset.dll
       138752  2020-10-20 01:32   pdftopng/libexpat.dll
       936960  2020-10-20 01:32   pdftopng/libiconv.dll
       195072  2020-10-20 01:32   pdftopng/libpng16.dll
      1990656  2020-10-20 01:32   pdftopng/pdftopng.cp38-win_amd64.pyd
        27943  2020-10-20 01:24   pdftopng/pdftopng.cpp
       614400  2020-10-20 01:32   pdftopng/turbojpeg.dll
        85504  2020-10-20 01:32   pdftopng/zlib1.dll
        18431  2020-10-20 01:32   pdftopng-0.1.0.dist-info/LICENSE
         1181  2020-10-20 01:32   pdftopng-0.1.0.dist-info/METADATA
          105  2020-10-20 01:32   pdftopng-0.1.0.dist-info/WHEEL
            9  2020-10-20 01:32   pdftopng-0.1.0.dist-info/top_level.txt
         1759  2020-10-20 01:32   pdftopng-0.1.0.dist-info/RECORD
    ---------                     -------
      8840752                     22 files

Some other things I learned while trying to make the build succeed: