Day 45b — How to (almost) build a C extension wheel on Windows (with external dependencies)14 October 2020 · recurse-center Tweet
I looked into how to build C extension wheels on Windows over the weekend. Since there isn't a fastmac equivalent to get a Windows machine for debugging, I booted up Windows on my laptop after a really long time! I need to find a
Visual Studio 2019 Community Edition
The Python packaging docs mentioned that I needed to install Visual Studio Community Edition, 2015 or later for Python 3.5+. All Visual Studio versions after 2015 are backwards compatible! I installed Visual Studio 2019 and selected the Python native development tools checkbox in the setup application.
The setup put
link.exe (Windows equivalent of
ld), with some other tools, in my Program Files! Looking into the Program Files brought back very old memories of fiddling with a game's files inside this directory to make everything work :)
Git and Python
I also installed git (which installed a lot of Unix tools too!) and Python 3.8 using the setup exes from their websites. All of these setups seemed to modify my
PATH variable automatically, because all of their executables were available in Powershell right after, and I could run
python in the Powershell terminal.
Building a "hello world" C program
As a test, I tried to build the
ncurses "hello world" program. But since
ncurses is not supported on Windows (there's PDCurses though), I commented out all of the
ncurses function, replacing the
printw with a
printf which basically made it a "hello world"
(venv) > python -m pip install . 💥
BOOM! I got my first error which said "fatal error LNK1112: module machine type 'x86' conflicts with target machine type 'x64'". I was using
x86 tools to build something for my
Somehow, Powershell was configured to only use the
x86 toolchain, and I wasn't sure how to make it use the
x64 one. At this point, I found the (x64) native tools command prompt which gets installed with the Visual Studio setup, and has everything configured correctly. So I jumped onto the "native tools command prompt" submarine from the Powershell ship!
After the switch, I was able to build and install the "hello world" C extension!
(venv) > python -m pip install . Processing c:\users\vinayak mehta\desktop\development\onix Installing build dependencies ... done Getting requirements to build wheel ... done Preparing wheel metadata ... done Requirement already satisfied: Click>=7.0 in c:\users\vinayak mehta\appdata\local\programs\python\python38\lib\site-packages (from onix==0.1.0) (7.1.2) Building wheels for collected packages: onix Building wheel for onix (PEP 517) ... done Created wheel for onix: filename=onix-0.1.0-cp38-cp38-win_amd64.whl size=47069 sha256=8681e069d73e567d865f601cb212429f0ef335a320d031c188576078ef3f1eba Stored in directory: C:\Users\Vinayak Mehta\AppData\Local\Temp\pip-ephem-wheel-cache-jgv05vv3\wheels\33\5f\a8\63d76ba35c8c629936b3485a15ffe5ccb25fe1304159ebc9d8 Successfully built onix Installing collected packages: onix Successfully installed onix-0.1.0 WARNING: You are using pip version 20.2.1; however, version 20.2.3 is available. You should consider upgrading via the 'C:\Users\Vinayak Mehta\AppData\Local\Programs\Python\Python38\python.exe -m pip install --upgrade pip' command.
And call the executable!
(venv) > onix.exe Hello, snek! (venv) >
vcpkg and external dependencies
After that I moved onto the slightly more complex C extension, which has some external dependencies. I wasn't sure if there was a way for
cygwin to work with the native tools command prompt (I'm sure there is). I also wasn't sure if the wheels built on
g++ would play wheel on Windows, so I started looking for a way to install external dependencies.
I'd heard of how
choco is this new and shiny package manager for Windows, but couldn't find the packages I required to build
libjpeg) on their repository. But I found
vcpkg! (A C/C++ library manager for Windows, Linux, and macOS released by Microsoft)
vcpkg was easy, I just followed the quickstart from the README, and was able to install the dependencies I needed after that!
(venv) c:\dev>.\vcpkg\vcpkg.exe install freetype fontconfig libpng libjpeg-turbo
Once you install libraries using
vcpkg, you can use them with
cmake by adding
-DCMAKE_TOOLCHAIN_FILE=C:/path/to/vcpkg.cmake to your
> cmake -DCMAKE_TOOLCHAIN_FILE=C:/dev/vcpkg/scripts/buildsystems/vcpkg.cmake -D ENABLE_QT5=OFF -D ENABLE_LIBOPENJPEG=none -D ENABLE_CPP=OFF ..
But that didn't do the trick for me!
cmake wasn't able to find the libraries I installed. I had to add the directory where
vcpkg installed all the dependencies to my
> set PATH=%PATH%;C:\dev\vcpkg\installed\x86-windows\bin
After which the previous
cmake command succeeded! I thought the
poppler build would succeed after that:
> cmake -DCMAKE_TOOLCHAIN_FILE=C:/dev/vcpkg/scripts/buildsystems/vcpkg.cmake -D ENABLE_QT5=OFF -D ENABLE_LIBOPENJPEG=none -D ENABLE_CPP=OFF .. > cmake --build . --target poppler --config Release 💥
BOOM! I got a lot of "unresolved external symbol" errors! Turns out
vcpkg installs the
x86 version of libraries by default, and I was building
poppler for my
x64 target on the
x64 native tools command prompt! I needed the 64-bit version of each dependency:
c:\dev>.\vcpkg\vcpkg.exe install freetype:x64-windows fontconfig:x64-windows libpng:x64-windows libjpeg-turbo:x64-windows
I also removed the earlier
vcpkg directory from my
PATH and added the new
x64 one instead:
> set PATH=%PATH%;C:\dev\vcpkg\installed\x64-windows\bin
poppler was built successfully!
$ cmake -DCMAKE_TOOLCHAIN_FILE=C:/dev/vcpkg/scripts/buildsystems/vcpkg.cmake -D ENABLE_QT5=OFF -D ENABLE_LIBOPENJPEG=none -D ENABLE_CPP=OFF .. $ cmake --build . --config Release --target poppler --verbose
I'm not sure if I even need the
vcpkg toolchain file at all, since the required libraries are available on my
PATH. I'll need to get back to Windows to find out.
Building the pdftopng C extension
After that, I moved to building and installing the C extension:
(venv) > python -m pip install -v -e . Creating library build\temp.win-amd64-3.8\Release\src\poppler_utils\pdftopng.cp38-win_amd64.lib and object build\temp.win-amd64-3.8\Release\src\poppler_utils\pdftopng. cp38-win_amd64.exp poppler.lib(GlobalParams.obj) : error LNK2001: unresolved external symbol __imp_RegCloseKey ... poppler.lib(JpegWriter.obj) : error LNK2001: unresolved external symbol jpeg_std_error ... poppler.lib(PNGWriter.obj) : error LNK2001: unresolved external symbol png_create_write_struct ... poppler.lib(DCTStream.obj) : error LNK2001: unresolved external symbol jpeg_CreateDecompress ... poppler.lib(SplashFTFontEngine.obj) : error LNK2001: unresolved external symbol FT_Init_FreeType ... poppler.lib(SplashFTFont.obj) : error LNK2001: unresolved external symbol FT_Set_Pixel_Sizes ... build\lib.win-amd64-3.8\poppler_utils\pdftopng.cp38-win_amd64.pyd : fatal error LNK1120: 51 unresolved externals error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Community\\VC\\Tools\\MSVC\\14.27.29110\\bin\\HostX86\\x64\\link.exe' failed with exit status 1120
BOOM! More "unresolved external symbol" errors! I was able to see the compiler and linker commands in the verbose
-v output, so I copied them, tried to understand all the options, and then ran the commands manually.
So my header files were going in correctly to
cl.exe, as specified in the
setup.py. And the command was exiting successfully.
(venv) > cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -DVERSION_INFO="0.1.0" "-Ic:\Users\Vinayak Mehta\Desktop\development\poppler-utils\lib\poppler" "-Ic:\Users\Vinayak Mehta\Desktop\development\poppler-utils\lib\poppler\fofi" "-Ic:\Users\Vinayak Mehta\Desktop\development\poppler-utils\lib\poppler\goo" "-Ic:\Users\Vinayak Mehta\Desktop\development\poppler-utils\lib\poppler\poppler" "-Ic:\Users\Vinayak Mehta\Desktop\development\poppler-utils\lib\poppler\build" "-Ic:\Users\Vinayak Mehta\Desktop\development\poppler-utils\lib\poppler\build\poppler" "-Ic:\Users\Vinayak Mehta\Desktop\development\poppler-utils\lib\poppler\utils" "-Ic:\Users\Vinayak Mehta\Desktop\development\poppler-utils\lib\poppler\build\utils" "-IC:\Users\Vinayak Mehta\AppData\Local\Programs\Python\Python38\lib\site-packages\pybind11\include" "-IC:\Users\Vinayak Mehta\AppData\Local\Programs\Python\Python38\include" ... /EHsc /Tpsrc\poppler_utils\pdftopng.cpp /Fobuild\temp.win-amd64-3.8\Release\src\poppler_utils\pdftopng.obj /EHsc /std:c++14
The errors were being raised at the linking stage with
(venv) > link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO ... /EXPORT:PyInit_pdftopng build\temp.win-amd64-3.8\Release\src\poppler_utils\pdftopng.obj /OUT:build\lib.win-amd64-3.8\poppler_utils\pdftopng.cp38-win_amd64.pyd /IMPLIB:build\temp.win-amd64-3.8\Release\src\poppler_utils\pdftopng.cp38-win_amd64.lib "c:\Users\Vinayak Mehta\Desktop\development\poppler-utils\lib\poppler\build\Release\poppler.lib"
I'd made the assumption that
link.exe would be able to find all the required libraries, since I put that directory on the
PATH right? (Just like
LD_LIBRARY_PATH!) Turns out that assumption was incorrect, and I needed to specify all the libraries explicitly!
(venv) > link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO ... /EXPORT:PyInit_pdftopng build\temp.win-amd64-3.8\Release\src\poppler_utils\pdftopng.obj /OUT:build\lib.win-amd64-3.8\poppler_utils\pdftopng.cp38-win_amd64.pyd /IMPLIB:build\temp.win-amd64-3.8\Release\src\poppler_utils\pdftopng.cp38-win_amd64.lib "c:\Users\Vinayak Mehta\Desktop\development\poppler-utils\lib\poppler\build\Release\poppler.lib" "c:\dev\vcpkg\installed\x64-windows\lib\freetype.lib" "c:\dev\vcpkg\installed\x64-windows\lib\fontconfig.lib" "c:\dev\vcpkg\installed\x64-windows\lib\libpng16.lib" "c:\dev\vcpkg\installed\x64-windows\lib\jpeg.lib"
The "unresolved external symbol" errors started going away as I added those libraries one by one! There were still three unresolved symbols though:
poppler.lib(GlobalParams.obj) : error LNK2001: unresolved external symbol __imp_RegCloseKey poppler.lib(GlobalParams.obj) : error LNK2001: unresolved external symbol __imp_RegEnumValueA poppler.lib(GlobalParams.obj) : error LNK2001: unresolved external symbol __imp_RegOpenKeyExA
/LIBPATH using which you can specify the path where the linker should look for libraries, so that you don't have to specify the full path to each library:
(venv) > link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO "/LIBPATH:C:\dev\vcpkg\installed\x64-windows\lib" ... /EXPORT:PyInit_pdftopng build\temp.win-amd64-3.8\Release\src\poppler_utils\pdftopng.obj /OUT:build\lib.win-amd64-3.8\poppler_utils\pdftopng.cp38-win_amd64.pyd /IMPLIB:build\temp.win-amd64-3.8\Release\src\poppler_utils\pdftopng.cp38-win_amd64.lib "c:\Users\Vinayak Mehta\Desktop\development\poppler-utils\lib\poppler\build\Release\poppler.lib" freetype.lib fontconfig.lib libpng16.lib jpeg.lib advapi32.lib
But as it turns out, all of this can be done programatically using the
libraries keyword arguments for
You can also specify the libraries to link against when building your extension, and the directories to search for those libraries. The libraries option is a list of libraries to link against, library_dirs is a list of directories to search for libraries at link-time, and runtime_library_dirs is a list of directories to search for shared (dynamically loaded) libraries at run-time. (Again, this sort of non-portable construct should be avoided if you intend to distribute your code.) — Distutils documentation
So I modified the
setup.py to use these keyword arguments (I'll have to remove the hard-coded path to
library_dirs =  libraries =  if sys.platform == "win32": vcpkg_dir = os.path.join("C:\\", "dev", "vcpkg", "installed", "x64-windows", "lib") build_dir = os.path.join(os.getcwd(), "lib", "poppler", "build", "Release") library_dirs.extend([vcpkg_dir, build_dir]) libraries.extend( ["freetype", "fontconfig", "libpng16", "jpeg", "advapi32", "poppler"] ) ext_modules = [ Extension( "poppler_utils.pdftopng", # Sort input source files to ensure bit-for-bit reproducible builds # (https://github.com/pybind/python_example/pull/53) sorted([os.path.join("src", "poppler_utils", "pdftopng.cpp")]), include_dirs=ext_includes, library_dirs=library_dirs, libraries=libraries, language="c++", ), ]
And after that change, the extension was built and installed successfully! There was a
pyd file in the
src/poppler_utils directory (as I installed the extension in editable mode)!
(venv) > python >>> from poppler_utils import pdftopng Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: DLL load failed while importing pdftopng: The specified module could not be found. >>>
ImportError though :(
The error “The specified module could not be found” is a bit misleading on Windows because it means either the DLL you are trying to load or any of its dependencies cannot be located. — SO answer
Dependencies cannot be located? I thought that putting the
bin directory on my
PATH earlier was supposed to solve that, but seems like it didn't. I found this nice tool which printed all the libraries that my
pyd file wanted:
The usual suspects! After copying all the DLLs from
bin directory to the
pyd file's directory, everything worked! At last!
(venv) > cp C:\dev\vcpkg\installed\x64-windows\bin\*.dll src\poppler_utils (venv) > python >>> from poppler_utils import pdftopng >>> pdftopng.convert(pdf_path="foo.pdf", png_path="foo") >>>
I also found this doc about the dynamic link library search order, but I'll check that out later.
So it looks like I just need to bundle all those DLLs into the wheel somehow. This is how these projects seem to do it:
numpybundles the DLL with a hash attached to its name (possibly to tie it to a unique build). It uses the mingw-w64 toolchain on Appveyor to build Windows wheels. And also does some magic which I don't yet understand.
$ unzip -l numpy-1.19.2-cp38-cp38-win_amd64.whl | grep dll 32939993 2020-09-10 01:30 numpy/.libs/libopenblas.NOIJJG62EMASZI6NYURL6JBKM4EVBGM7.gfortran-win_amd64.dll
arrowbundles the DLLs by just copying them over from the build directory to the "final" directory. It doesn't seem to specify the DLLs in the
package_datathough. Does it unpack the built wheel, copy over the DLLs, and then zip the directory again?
$ unzip -l pyarrow-1.0.1-cp38-cp38-win_amd64.whl | grep dll 8459264 2020-08-17 19:35 pyarrow/arrow.dll 910336 2020-08-17 19:35 pyarrow/arrow_dataset.dll 2610176 2020-08-17 19:35 pyarrow/arrow_flight.dll 1264640 2020-08-17 19:35 pyarrow/arrow_python.dll 91648 2020-08-17 19:35 pyarrow/arrow_python_flight.dll 81920 2020-08-17 19:35 pyarrow/cares.dll 3249664 2020-08-17 19:35 pyarrow/libcrypto-1_1-x64.dll 2661888 2020-08-17 19:35 pyarrow/libprotobuf.dll 651264 2020-08-17 19:35 pyarrow/libssl-1_1-x64.dll 2204672 2020-08-17 19:35 pyarrow/parquet.dll 89600 2020-08-17 19:35 pyarrow/zlib.dll
- (I think) I heard in this awesome PyCon 2019 manylinux wheel talk that
auditwheel(1) finds shared libraries your extension is dependent on, (2) copies them over into the wheel + gives them unique names, and (3) modifies the
ORIGINin the shared library for your extension so that the copied libraries can be loaded correctly. Would a tool for Windows also have to do something like this? Or would copying over the DLLs into the same directory as your
pydwork fine, because
arrowseems to do this? (Maybe
arrowdoes some other magic that I don't know about yet.)
- After the bundling bit is figured out, I think automating all the steps from above into a GitHub workflow should be possible theoretically. Would I need to choose a specific Windows configuration on GitHub Actions that makes the output wheels backwards compatible with Windows 7/8?
- Maybe I could go the winrt route that Steve Dower showed me at EuroPython. But that won't be backwards compatible with Windows 7/8, and I've seen that some
camelotusers still use those Windows versions. Maybe I need to put out some sort of a survey.