Day 36 — Moar Python C extension talks!
01 October 2020 · recurse-center TweetToday I spent most of my time watching some talks on writing Python C extensions. Every talk mentioned that the main reasons for writing an extension are:
- Interfacing with C/C++ (I'm in this camp!)
- Improving performance of Python code by rewriting it in C/C++
I need to try and write some code to talk to one of the large C/C++ projects I mentioned in an earlier post instead of procrastinating by watching talks :(
I learned a lot from these talks by Paul Ross:
Here be Dragons - Writing Safe C Extensions
He talks about various gotchas we should keep in mind while using Python's C API. There's three types of references when we interact with PyObject
s:
- New: These occur when a new
PyObject
is created, for example: when we create a new list. - Stolen: These occur when a PyObject is created and assigned, for example: appending a new item to a list.
- Borrowed: These occur when "getting" a PyObject, for example: accessing an item in a list.
New references
When we create a new PyObject
, it's our job to deallocate it by decrementing its reference count!
static PyObject *subtract_long(long a, long b) {
PyObject *pA, *pB, *r;
pA = PyLong_FromLong(a); // New ref
pB = PyLong_FromLong(b); // New ref
r = PyNumber_Subtract(pA, pB); // New ref
Py_DECREF(pA); // You must decref
Py_DECREF(pB); // You must decref
return r; // Caller must decref
}
He shows the function above which subtracts two long integers. And talks about using Py_DECREF
to decrement reference counts for the new PyObject
s (pA
and pB
) we create in the function, after we've used them.
static PyObject *subtract_long(long a, long b) {
return PyNumber_Subtract(
PyLong_FromLong(a), // a leak
PyLong_FromLong(b) // another leak
);
}
We should not try to be "Pythonic" and do something like above (without the appropriate Py_DECREF
calls) because that's how we get memory leaks!
Stolen references
When we create a new PyObject
, but it gets stolen, it's the "thief"'s job to deallocate it.
static PyObject *subtract_long(long a, long b) {
PyObject *r, *v;
r = PyTuple_New(3); // New ref
v = PyLong_FromLong(1L); // New ref
PyTuple_SetItem(r, 0, v);
v = PyLong_FromLong(2L); // New ref
PyTuple_SetItem(r, 1, v);
// More common pattern
PyTuple_SetItem(r, 2, PyLong_FromLong(3L));
return r; // Callers must decref
}
In the example above, we create a new PyObject
called v
which gets stolen by r
(a tuple) when we assign v
as its first element. In this case, we don't need to use Py_DECREF
to decrement v
's reference count.
PyObject *r, *v;
r = PyTuple_New(3); // New ref
v = PyLong_FromLong(1L); // New ref
PyTuple_SetItem(r, 0, v); // r steals v
Py_DECREF(v); // NO! v belongs to r
This is a big no-no! And will result in bugs because Python's garbage collector might deallocate v
before r
has had a chance to process it!
Borrowed references
When we borrow a reference to a PyObject
, the real owner can deallocate it at any time, unless we prevent them by registering our interest!
static PyObject *borrow_bad(PyObject *pList) {
PyObject *pFirst;
pFirst = PyList_GetItem(pList, 0);
function(pList); // Dragons ahoy!
PyObject_Print(pFirst, stdout, 0);
Py_RETURN_NONE;
}
In the above example, we create a new PyObject
called pFirst
, and make it point to the first element of pList
. But what happens when we pass pList
into a function that removes the first element of the list? What is pFirst
going to point to?
Paul Ross shows that this function actually causes a seg fault when we try to call it from the Python REPL!
static PyObject *borrow_bad(PyObject *pList) {
PyObject *pFirst;
pFirst = PyList_GetItem(pList, 0);
Py_INCREF(pFirst); // Register your interest
function(pList);
PyObject_Print(pFirst, stdout, 0);
Py_DECREF(pFirst); // Let go
pFirst = NULL;
Py_RETURN_NONE;
}
The fix is to increase the reference count for pFirst
when we create it, decrease it after we've used it, and then set it to NULL
so we don't accidentally use it later (if there's a lot of other code below PyObject_Print
).
A pattern for reliable C
He also describes a pattern to write reliable C when we're interacting with Python's C API, where he shows how we can write Pythonic C (similar to Python's try-except-finally!) using goto
statements:
static PyObject *function(PyObject *arg1) {
PyObject *ret = NULL;
goto try;
try:
// Do stuff here
// On error, "goto except;"
goto finally;
except:
// Handle exception
finally:
// Finish
return ret;
}
We should also:
- Correctly increment and decrement counts for borrowed references
- Have consistent exceptions!
- Either we set the exception string and return NULL
- Or we don't set the exception string and return a non-NULL
He has written about this in a lot more detail at Coding Patterns for Python Extensions.
A faster Python? You Have These Choices
This talk is a survey of various tools that can help us write Python C extensions. Paul Ross shows some tradeoffs that we should keep in mind while choosing a tool. And he also mentions that we should structure our code like this so that it's easier to test the Python and C code separately, because testing glue code can be cumbersome.
Python code - - > glue.c - - > C/C++ code
^ ^
| |
| |
Python tests C/C++ tests