Day 4 — Hyrum's Law

Today I worked on some open issues and pull requests for camelot and excalibur. Last month, pdfminer.six (one of camelot's dependencies) broke backwards compatibility by renaming the PDFTextExtractionNotAllowed exception. camelot raises it while getting the page layout if the page is not extractable. I'd added it back in 2016 after looking at the basic usage section of the pdfminer docs (now unmaintained). The usage of this exception has been abtracted away in the pdfminer.six docs (the fork that is now maintained by pdfminer contributors).

Soon after this rename was published on PyPI, someone raised an issue on the camelot issue tracker. import camelot started breaking for a lot of users because camelot pins the minimum version for all its dependencies (including pdfminer.six) with >=. I reported it on the pdfminer.six Gitter room, and a contributor who was facing the same issue made a fix which was merged and then released after 6 days. Meanwhile, I pinned the minimum version for pdfminer.six to >=20200726 on both PyPI and conda-forge.

This reminded me of an issue someone raised on the excalibur issue tracker in February, when Werkzeug broke backwards compatibility by changing how you import the secure_filename function.

Source: Twitter

Once you put out an interface; the more time it spends in the wild (accumulating users), the more difficult it becomes to change it. Even if that change is something that you now consider to be a private interface. Today I learned that this observation has a name, and it's called Hyrum's Law.

Source: xkcd

The sad fact of life is that no matter how careful you are, the more popular your library is the more likely it is that any change is going to break someone. — Versioning Software

You have a library with some incidental, undocumented and unspecified behavior that you consider to be obviously not part of the public interface. You change it to solve what seems like a bug to you, and make a patch release, only to find that you have angry hordes at the gate who, thanks to Hyrum's Law, depend on the old behavior. — Version numbers: how to use them?

How do you handle Hyrum's Law as creators and users?

As a creator, make sure you test all old interfaces so that your CI breaks when sudden changes are made. In the case where you want to remove an old interface, raise a deprecation warning when that interface is called, and give your users enough time to migrate before you remove it. scikit-learn does this by raising a deprecation warning and waiting for two minor versions before they remove an old interface. Armin Ronacher said that he used to give users well above a year to migrate. These are good places to start in terms of thinking about deprecation windows.

How do you find out if an old interface is used widely? Use GitHub global search.

Source: Twitter

As a user,

Rely on CI, potentially on a cron job, to detect when a project breaks for you instead of leaving it up to the project to try and make that call. — Why I don't like SemVer anymore

You can run your test suite as a cron job (on GitHub Actions!) to detect breakages. By default, pytest will display DeprecationWarning and PendingDeprecationWarning warnings from user code and third-party libraries, as recommended by PEP-0565. There should be a way for pytest to instead raise an error whenever it catches a deprecation warning. I'll update it here when I find it.

Can we agree on calling this "VerOps"?