Day 45b — How to (almost) build a C extension wheel on Windows (with external dependencies)
14 October 2020 · recurse-center TweetI looked into how to build C extension wheels on Windows over the weekend. Since there isn't a fastmac equivalent to get a Windows machine for debugging, I booted up Windows on my laptop after a really long time! I need to find a fastwin
or winfast
!
Installation
Visual Studio 2019 Community Edition
The Python packaging docs mentioned that I needed to install Visual Studio Community Edition, 2015 or later for Python 3.5+. All Visual Studio versions after 2015 are backwards compatible! I installed Visual Studio 2019 and selected the Python native development tools checkbox in the setup application.
The setup put cl.exe
and link.exe
(Windows equivalent of cc
and ld
), with some other tools, in my Program Files! Looking into the Program Files brought back very old memories of fiddling with a game's files inside this directory to make everything work :)
Git and Python
I also installed git (which installed a lot of Unix tools too!) and Python 3.8 using the setup exes from their websites. All of these setups seemed to modify my PATH
variable automatically, because all of their executables were available in Powershell right after, and I could run cl.exe
, link.exe
, git
, and python
in the Powershell terminal.
Building a "hello world" C program
As a test, I tried to build the ncurses
"hello world" program. But since ncurses
is not supported on Windows (there's PDCurses though), I commented out all of the ncurses
function, replacing the printw
with a printf
which basically made it a "hello world" C
program!
(venv) > python -m pip install .
💥
BOOM! I got my first error which said "fatal error LNK1112: module machine type 'x86' conflicts with target machine type 'x64'". I was using x86
tools to build something for my x64
system.
Somehow, Powershell was configured to only use the x86
toolchain, and I wasn't sure how to make it use the x64
one. At this point, I found the (x64) native tools command prompt which gets installed with the Visual Studio setup, and has everything configured correctly. So I jumped onto the "native tools command prompt" submarine from the Powershell ship!
After the switch, I was able to build and install the "hello world" C extension!
(venv) > python -m pip install .
Processing c:\users\vinayak mehta\desktop\development\onix
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing wheel metadata ... done
Requirement already satisfied: Click>=7.0 in c:\users\vinayak mehta\appdata\local\programs\python\python38\lib\site-packages (from onix==0.1.0) (7.1.2)
Building wheels for collected packages: onix
Building wheel for onix (PEP 517) ... done
Created wheel for onix: filename=onix-0.1.0-cp38-cp38-win_amd64.whl size=47069 sha256=8681e069d73e567d865f601cb212429f0ef335a320d031c188576078ef3f1eba
Stored in directory: C:\Users\Vinayak Mehta\AppData\Local\Temp\pip-ephem-wheel-cache-jgv05vv3\wheels\33\5f\a8\63d76ba35c8c629936b3485a15ffe5ccb25fe1304159ebc9d8
Successfully built onix
Installing collected packages: onix
Successfully installed onix-0.1.0
WARNING: You are using pip version 20.2.1; however, version 20.2.3 is available.
You should consider upgrading via the 'C:\Users\Vinayak Mehta\AppData\Local\Programs\Python\Python38\python.exe -m pip install --upgrade pip' command.
And call the executable!
(venv) > onix.exe
Hello, snek!
(venv) >
vcpkg and external dependencies
After that I moved onto the slightly more complex C extension, which has some external dependencies. I wasn't sure if there was a way for cygwin
to work with the native tools command prompt (I'm sure there is). I also wasn't sure if the wheels built on cygwin
with gcc
/g++
would play wheel on Windows, so I started looking for a way to install external dependencies.
I'd heard of how choco
is this new and shiny package manager for Windows, but couldn't find the packages I required to build poppler
(freetype
, fontconfig
, libpng
, and libjpeg
) on their repository. But I found vcpkg
! (A C/C++ library manager for Windows, Linux, and macOS released by Microsoft)
Installing vcpkg
was easy, I just followed the quickstart from the README, and was able to install the dependencies I needed after that!
(venv) c:\dev>.\vcpkg\vcpkg.exe install freetype fontconfig libpng libjpeg-turbo
Building poppler
Once you install libraries using vcpkg
, you can use them with cmake
by adding -DCMAKE_TOOLCHAIN_FILE=C:/path/to/vcpkg.cmake
to your cmake
command.
> cmake -DCMAKE_TOOLCHAIN_FILE=C:/dev/vcpkg/scripts/buildsystems/vcpkg.cmake -D ENABLE_QT5=OFF -D ENABLE_LIBOPENJPEG=none -D ENABLE_CPP=OFF ..
But that didn't do the trick for me! cmake
wasn't able to find the libraries I installed. I had to add the directory where vcpkg
installed all the dependencies to my PATH
:
> set PATH=%PATH%;C:\dev\vcpkg\installed\x86-windows\bin
After which the previous cmake
command succeeded! I thought the poppler
build would succeed after that:
> cmake -DCMAKE_TOOLCHAIN_FILE=C:/dev/vcpkg/scripts/buildsystems/vcpkg.cmake -D ENABLE_QT5=OFF -D ENABLE_LIBOPENJPEG=none -D ENABLE_CPP=OFF ..
> cmake --build . --target poppler --config Release
💥
BOOM! I got a lot of "unresolved external symbol" errors! Turns out vcpkg
installs the x86
version of libraries by default, and I was building poppler
for my x64
target on the x64
native tools command prompt! I needed the 64-bit version of each dependency:
c:\dev>.\vcpkg\vcpkg.exe install freetype:x64-windows fontconfig:x64-windows libpng:x64-windows libjpeg-turbo:x64-windows
I also removed the earlier x86
vcpkg
directory from my PATH
and added the new x64
one instead:
> set PATH=%PATH%;C:\dev\vcpkg\installed\x64-windows\bin
Finally poppler
was built successfully!
$ cmake -DCMAKE_TOOLCHAIN_FILE=C:/dev/vcpkg/scripts/buildsystems/vcpkg.cmake -D ENABLE_QT5=OFF -D ENABLE_LIBOPENJPEG=none -D ENABLE_CPP=OFF ..
$ cmake --build . --config Release --target poppler --verbose
I'm not sure if I even need the vcpkg
toolchain file at all, since the required libraries are available on my PATH
. I'll need to get back to Windows to find out.
Building the pdftopng C extension
After that, I moved to building and installing the C extension:
(venv) > python -m pip install -v -e .
Creating library build\temp.win-amd64-3.8\Release\src\poppler_utils\pdftopng.cp38-win_amd64.lib and object build\temp.win-amd64-3.8\Release\src\poppler_utils\pdftopng. cp38-win_amd64.exp
poppler.lib(GlobalParams.obj) : error LNK2001: unresolved external symbol __imp_RegCloseKey
...
poppler.lib(JpegWriter.obj) : error LNK2001: unresolved external symbol jpeg_std_error
...
poppler.lib(PNGWriter.obj) : error LNK2001: unresolved external symbol png_create_write_struct
...
poppler.lib(DCTStream.obj) : error LNK2001: unresolved external symbol jpeg_CreateDecompress
...
poppler.lib(SplashFTFontEngine.obj) : error LNK2001: unresolved external symbol FT_Init_FreeType
...
poppler.lib(SplashFTFont.obj) : error LNK2001: unresolved external symbol FT_Set_Pixel_Sizes
...
build\lib.win-amd64-3.8\poppler_utils\pdftopng.cp38-win_amd64.pyd : fatal error LNK1120: 51 unresolved externals
error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Community\\VC\\Tools\\MSVC\\14.27.29110\\bin\\HostX86\\x64\\link.exe' failed with exit status 1120
BOOM! More "unresolved external symbol" errors! I was able to see the compiler and linker commands in the verbose -v
output, so I copied them, tried to understand all the options, and then ran the commands manually.
So my header files were going in correctly to cl.exe
, as specified in the setup.py
. And the command was exiting successfully.
(venv) > cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -DVERSION_INFO="0.1.0"
"-Ic:\Users\Vinayak Mehta\Desktop\development\poppler-utils\lib\poppler"
"-Ic:\Users\Vinayak Mehta\Desktop\development\poppler-utils\lib\poppler\fofi"
"-Ic:\Users\Vinayak Mehta\Desktop\development\poppler-utils\lib\poppler\goo"
"-Ic:\Users\Vinayak Mehta\Desktop\development\poppler-utils\lib\poppler\poppler"
"-Ic:\Users\Vinayak Mehta\Desktop\development\poppler-utils\lib\poppler\build"
"-Ic:\Users\Vinayak Mehta\Desktop\development\poppler-utils\lib\poppler\build\poppler"
"-Ic:\Users\Vinayak Mehta\Desktop\development\poppler-utils\lib\poppler\utils"
"-Ic:\Users\Vinayak Mehta\Desktop\development\poppler-utils\lib\poppler\build\utils"
"-IC:\Users\Vinayak Mehta\AppData\Local\Programs\Python\Python38\lib\site-packages\pybind11\include"
"-IC:\Users\Vinayak Mehta\AppData\Local\Programs\Python\Python38\include"
...
/EHsc /Tpsrc\poppler_utils\pdftopng.cpp /Fobuild\temp.win-amd64-3.8\Release\src\poppler_utils\pdftopng.obj /EHsc /std:c++14
The errors were being raised at the linking stage with link.exe
:
(venv) > link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO
...
/EXPORT:PyInit_pdftopng
build\temp.win-amd64-3.8\Release\src\poppler_utils\pdftopng.obj
/OUT:build\lib.win-amd64-3.8\poppler_utils\pdftopng.cp38-win_amd64.pyd
/IMPLIB:build\temp.win-amd64-3.8\Release\src\poppler_utils\pdftopng.cp38-win_amd64.lib
"c:\Users\Vinayak Mehta\Desktop\development\poppler-utils\lib\poppler\build\Release\poppler.lib"
I'd made the assumption that link.exe
would be able to find all the required libraries, since I put that directory on the PATH
right? (Just like LD_LIBRARY_PATH
!) Turns out that assumption was incorrect, and I needed to specify all the libraries explicitly!
(venv) > link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO
...
/EXPORT:PyInit_pdftopng
build\temp.win-amd64-3.8\Release\src\poppler_utils\pdftopng.obj
/OUT:build\lib.win-amd64-3.8\poppler_utils\pdftopng.cp38-win_amd64.pyd
/IMPLIB:build\temp.win-amd64-3.8\Release\src\poppler_utils\pdftopng.cp38-win_amd64.lib
"c:\Users\Vinayak Mehta\Desktop\development\poppler-utils\lib\poppler\build\Release\poppler.lib"
"c:\dev\vcpkg\installed\x64-windows\lib\freetype.lib"
"c:\dev\vcpkg\installed\x64-windows\lib\fontconfig.lib"
"c:\dev\vcpkg\installed\x64-windows\lib\libpng16.lib"
"c:\dev\vcpkg\installed\x64-windows\lib\jpeg.lib"
The "unresolved external symbol" errors started going away as I added those libraries one by one! There were still three unresolved symbols though:
poppler.lib(GlobalParams.obj) : error LNK2001: unresolved external symbol __imp_RegCloseKey
poppler.lib(GlobalParams.obj) : error LNK2001: unresolved external symbol __imp_RegEnumValueA
poppler.lib(GlobalParams.obj) : error LNK2001: unresolved external symbol __imp_RegOpenKeyExA
To resolve this, I had to add advapi32.lib
which is a Windows-specific library to the link.exe
arguments! The linker seemed to find it without even specifying the full path.
There's also /LIBPATH
using which you can specify the path where the linker should look for libraries, so that you don't have to specify the full path to each library:
(venv) > link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO
"/LIBPATH:C:\dev\vcpkg\installed\x64-windows\lib"
...
/EXPORT:PyInit_pdftopng
build\temp.win-amd64-3.8\Release\src\poppler_utils\pdftopng.obj
/OUT:build\lib.win-amd64-3.8\poppler_utils\pdftopng.cp38-win_amd64.pyd
/IMPLIB:build\temp.win-amd64-3.8\Release\src\poppler_utils\pdftopng.cp38-win_amd64.lib
"c:\Users\Vinayak Mehta\Desktop\development\poppler-utils\lib\poppler\build\Release\poppler.lib"
freetype.lib fontconfig.lib libpng16.lib jpeg.lib advapi32.lib
But as it turns out, all of this can be done programatically using the library_dirs
and libraries
keyword arguments for setuptools.Extension
!
You can also specify the libraries to link against when building your extension, and the directories to search for those libraries. The libraries option is a list of libraries to link against, library_dirs is a list of directories to search for libraries at link-time, and runtime_library_dirs is a list of directories to search for shared (dynamically loaded) libraries at run-time. (Again, this sort of non-portable construct should be avoided if you intend to distribute your code.) — Distutils documentation
So I modified the setup.py
to use these keyword arguments (I'll have to remove the hard-coded path to vcpkg_dir
):
library_dirs = []
libraries = []
if sys.platform == "win32":
vcpkg_dir = os.path.join("C:\\", "dev", "vcpkg", "installed", "x64-windows", "lib")
build_dir = os.path.join(os.getcwd(), "lib", "poppler", "build", "Release")
library_dirs.extend([vcpkg_dir, build_dir])
libraries.extend(
["freetype", "fontconfig", "libpng16", "jpeg", "advapi32", "poppler"]
)
ext_modules = [
Extension(
"poppler_utils.pdftopng",
# Sort input source files to ensure bit-for-bit reproducible builds
# (https://github.com/pybind/python_example/pull/53)
sorted([os.path.join("src", "poppler_utils", "pdftopng.cpp")]),
include_dirs=ext_includes,
library_dirs=library_dirs,
libraries=libraries,
language="c++",
),
]
And after that change, the extension was built and installed successfully! There was a pyd
file in the src/poppler_utils
directory (as I installed the extension in editable mode)!
(venv) > python
>>> from poppler_utils import pdftopng
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: DLL load failed while importing pdftopng: The specified module could not be found.
>>>
BOOM! ImportError
though :(
The error “The specified module could not be found” is a bit misleading on Windows because it means either the DLL you are trying to load or any of its dependencies cannot be located. — SO answer
Dependencies cannot be located? I thought that putting the vcpkg
bin
directory on my PATH
earlier was supposed to solve that, but seems like it didn't. I found this nice tool which printed all the libraries that my pyd
file wanted:
The usual suspects! After copying all the DLLs from vcpkg
's bin
directory to the pyd
file's directory, everything worked! At last!
(venv) > cp C:\dev\vcpkg\installed\x64-windows\bin\*.dll src\poppler_utils
(venv) > python
>>> from poppler_utils import pdftopng
>>> pdftopng.convert(pdf_path="foo.pdf", png_path="foo")
>>>
I also found this doc about the dynamic link library search order, but I'll check that out later.
Questions
So it looks like I just need to bundle all those DLLs into the wheel somehow. This is how these projects seem to do it:
numpy
bundles the DLL with a hash attached to its name (possibly to tie it to a unique build). It uses the mingw-w64 toolchain on Appveyor to build Windows wheels. And also does some magic which I don't yet understand.
$ unzip -l numpy-1.19.2-cp38-cp38-win_amd64.whl | grep dll
32939993 2020-09-10 01:30 numpy/.libs/libopenblas.NOIJJG62EMASZI6NYURL6JBKM4EVBGM7.gfortran-win_amd64.dll
arrow
bundles the DLLs by just copying them over from the build directory to the "final" directory. It doesn't seem to specify the DLLs in thepackage_data
though. Does it unpack the built wheel, copy over the DLLs, and then zip the directory again?
$ unzip -l pyarrow-1.0.1-cp38-cp38-win_amd64.whl | grep dll
8459264 2020-08-17 19:35 pyarrow/arrow.dll
910336 2020-08-17 19:35 pyarrow/arrow_dataset.dll
2610176 2020-08-17 19:35 pyarrow/arrow_flight.dll
1264640 2020-08-17 19:35 pyarrow/arrow_python.dll
91648 2020-08-17 19:35 pyarrow/arrow_python_flight.dll
81920 2020-08-17 19:35 pyarrow/cares.dll
3249664 2020-08-17 19:35 pyarrow/libcrypto-1_1-x64.dll
2661888 2020-08-17 19:35 pyarrow/libprotobuf.dll
651264 2020-08-17 19:35 pyarrow/libssl-1_1-x64.dll
2204672 2020-08-17 19:35 pyarrow/parquet.dll
89600 2020-08-17 19:35 pyarrow/zlib.dll
I also found some questions about an auditwheel
/ delocate
-like tool for Windows, but there's nothing out there yet.
- (I think) I heard in this awesome PyCon 2019 manylinux wheel talk that
auditwheel
(1) finds shared libraries your extension is dependent on, (2) copies them over into the wheel + gives them unique names, and (3) modifies theRPATH
/ORIGIN
in the shared library for your extension so that the copied libraries can be loaded correctly. Would a tool for Windows also have to do something like this? Or would copying over the DLLs into the same directory as yourpyd
work fine, becausearrow
seems to do this? (Maybearrow
does some other magic that I don't know about yet.) - After the bundling bit is figured out, I think automating all the steps from above into a GitHub workflow should be possible theoretically. Would I need to choose a specific Windows configuration on GitHub Actions that makes the output wheels backwards compatible with Windows 7/8?
- Maybe I could go the winrt route that Steve Dower showed me at EuroPython. But that won't be backwards compatible with Windows 7/8, and I've seen that some
camelot
users still use those Windows versions. Maybe I need to put out some sort of a survey.