Day 52 — Bundling DLLs with Windows wheels (the DLL mangling way)

Yesterday, after reading about the DLL Search Order on Windows and DLL Hell, I got concerned about shipping DLLs as package_data, because they might clash with other DLLs with same names that are shipped by other Windows wheels.

This is the gist of the whole issue: If a DLL with the same name (as the one you asked the system to load) is already loaded in memory, the system won't search for it. Which means that going the package_data way while giving your DLLs generic names might be kinda bad because that can make the DLLs you ship not play well with DLLs that other wheels ship, and make a user's application (which uses wheels with same DLL names but with different DLL versions) crash!

In this post, Nathaniel J. Smith gave an example of how that might happen if you ship a DLL named openssl.dll, and described some reliable ways to isolate DLLs within a wheel:

  • Manually give all your DLLs unique names. This probably requires manually hacking your build system to use the new name, maybe using black magic to generate some new .lib files, or else using a hacky tool built by some random person on github to patch your built binaries in-place. Then, hack your package’s __init__.py to either manually pre-load all these DLLs by absolute path, or else mutate the process’s PATH envvar to add a new directory you control, where you’ve placed all your DLLs.
  • Or something involving AddDllDirectory (but that’s Win8+ only, so you can’t rely on it). The details are extremely complicated.
  • Don’t ship DLLs; use static linking instead.

In another post, he described an outline on how one would go about writing an auditwheel-like tool for Windows:

  • Scan the wheel to find PE format files (DLLs and EXEs)
  • For each PE file, extract the list of DLLs that it links to – IIRC machomachomangler can do this, or probably lots of PE libraries can
  • Compare those to some list of which libraries are built into Windows and don’t need vendoring. This will probably need refinement over time, but you can probably get close by looking at a fresh Windows install. Or maybe Steve Dower can help.
  • For libraries which do need vendoring, move them into a directory in the wheel, and mangle their names, like how we do on Linux. AFAIK machomachomangler is the only existing tool for this part.
  • Possibly the most annoying part: windows doesn’t have an equivalent of RPATH. So we need to rewrite the wheel’s top-level __init__.py and inject some code to add our vendored libs dir to the DLL search path. There are a lot of ways to approach this, they’re all kinda gross, but I think we can hold our noses and make it work in practice.

I also found this numpy issue comment by Steve Dower where he mentions that:

The correct way to reference dependencies is to put them in the same directory as the module that is loading them.

The second most reliable way to have the loader use dependencies from a separate directory is to explicitly load them first using LoadLibrary[Ex] with a full path before importing the module that needs them. (I know, I know, I keep providing exactly the same suggestions every time this problem comes up. It's not my fault you guys don't like them :) )

"...put them in the same directory as the module that is loading them." YES! My extension worked when all the required DLLs were present in the same directory as the one PYD file. Since I don't have a lot of modules with a lot of PYD files which need their own DLLs like numpy (and also because I'm not yet familiar with poppler's cmake setup to build a static library out of it), I started looking for ways to mangle DLL names (and modify their import tables) before I copy them into the PYD file's directory.

The PYD (which is basically a DLL) is a Portable Executable file, similar to the ELF on Linux.

I found this question that Nathaniel asked on StackOverflow four years ago, and then answered it himself after building machomachomangler in two days! I also found pefile, an awesome library that can used to read and work with Portable Executable files! It lets you list all the DLLs a PE file depends on:


  >>> import pefile
  >>> pe = pefile.PE("pdftopng.cp38-win_amd64.pyd")
  >>> [i.dll for i in pe.DIRECTORY_ENTRY_IMPORT]
  [b'MSVCP140.dll',
   b'python38.dll',
   b'KERNEL32.dll',
   b'VCRUNTIME140_1.dll',
   b'VCRUNTIME140.dll',
   b'api-ms-win-crt-runtime-l1-1-0.dll',
   b'api-ms-win-crt-stdio-l1-1-0.dll',
   b'api-ms-win-crt-string-l1-1-0.dll',
   b'api-ms-win-crt-heap-l1-1-0.dll',
   b'api-ms-win-crt-convert-l1-1-0.dll',
   b'api-ms-win-crt-time-l1-1-0.dll',
   b'api-ms-win-crt-math-l1-1-0.dll',
   b'api-ms-win-crt-multibyte-l1-1-0.dll',
   b'api-ms-win-crt-locale-l1-1-0.dll',
   b'api-ms-win-crt-filesystem-l1-1-0.dll',
   b'freetype.dll',
   b'libpng16.dll',
   b'jpeg62.dll',
   b'ADVAPI32.dll']

Apparently, pefile also lets you modify a PE file but I wasn't able to modify my PYD file's import table using it (I need to look deeper and wrap my head around playing with bytes), so I decided to go ahead with machomachomangler:

A GitHub search for machomachomangler showed me these other tools that already use the library:

I wrote wheel_repair.py that I run on the built Windows wheel just like auditwheel for a Linux wheel, and delocate for a macOS wheel. This is a basic outline of what it does:

This means that I don't need the copy_dlls function and package_data from yesterday! Though there are some limitations which I need to handle:

The build succeeded and I was able to install the wheel in a fresh virtual environment, import pdftopng, and run the convert function!

Though when I looked at the DLL imports for this new PYD file using objdump, I couldn't see them with the system DLLs!


  $ objdump -x pdftopng.cp38-win_amd64.pyd | grep dll
      DLL Name: MSVCP140.dll
      DLL Name: python38.dll
      DLL Name: KERNEL32.dll
      DLL Name: VCRUNTIME140_1.dll
      DLL Name: VCRUNTIME140.dll
      DLL Name: api-ms-win-crt-runtime-l1-1-0.dll
      DLL Name: api-ms-win-crt-stdio-l1-1-0.dll
      DLL Name: api-ms-win-crt-string-l1-1-0.dll
      DLL Name: api-ms-win-crt-heap-l1-1-0.dll
      DLL Name: api-ms-win-crt-convert-l1-1-0.dll
      DLL Name: api-ms-win-crt-time-l1-1-0.dll
      DLL Name: api-ms-win-crt-math-l1-1-0.dll
      DLL Name: api-ms-win-crt-multibyte-l1-1-0.dll
      DLL Name: api-ms-win-crt-locale-l1-1-0.dll
      DLL Name: api-ms-win-crt-filesystem-l1-1-0.dll

    6 redll         00000040  00000001801eb000  00000001801eb000  001e6200  2**2
                    CONTENTS, ALLOC, LOAD, READONLY, DATA

Looks like machomachomangler adds a new section called redll to the PE file instead of adding the DLL names in the same section as the system DLLs:


  $ objdump -s -j redll -w pdftopng.cp38-win_amd64.pyd

  pdftopng.cp38-win_amd64.pyd:     file format pei-x86-64

  Contents of section redll:
   1801eb000 6a706567 36322d36 37333865 6463322e  jpeg62-6738edc2.
   1801eb010 646c6c00 66726565 74797065 2d626238  dll.freetype-bb8
   1801eb020 35313366 622e646c 6c006c69 62706e67  513fb.dll.libpng
   1801eb030 31362d61 30626636 6136642e 646c6c00  16-a0bf6a6d.dll.

I could see the mangled DLL names (jpeg62-6738edc2.dll, freetype-bb8513fb.dll, and libpng16-a0bf6a6d.dll) in the redll section! I need to look into how this works in practice though.

The pefile library shows the mangled DLL names with the system DLLs:


  >>> import pefile
  >>> pe = pefile.PE("pdftopng.cp38-win_amd64.pyd")
  >>> [i.dll for i in pe.DIRECTORY_ENTRY_IMPORT]
  [b'MSVCP140.dll',
   b'python38.dll',
   b'KERNEL32.dll',
   b'VCRUNTIME140_1.dll',
   b'VCRUNTIME140.dll',
   b'api-ms-win-crt-runtime-l1-1-0.dll',
   b'api-ms-win-crt-stdio-l1-1-0.dll',
   b'api-ms-win-crt-string-l1-1-0.dll',
   b'api-ms-win-crt-heap-l1-1-0.dll',
   b'api-ms-win-crt-convert-l1-1-0.dll',
   b'api-ms-win-crt-time-l1-1-0.dll',
   b'api-ms-win-crt-math-l1-1-0.dll',
   b'api-ms-win-crt-multibyte-l1-1-0.dll',
   b'api-ms-win-crt-locale-l1-1-0.dll',
   b'api-ms-win-crt-filesystem-l1-1-0.dll',
   b'freetype-bb8513fb.dll',
   b'libpng16-a0bf6a6d.dll',
   b'jpeg62-6738edc2.dll',
   b'ADVAPI32.dll']

Now I need to test this wheel on a fresh Windows install to see if I need to bundle any other DLL or not.

Other ways of bundling DLLs with Windows wheels