Day 36 — Moar Python C extension talks!

Today I spent most of my time watching some talks on writing Python C extensions. Every talk mentioned that the main reasons for writing an extension are:

I need to try and write some code to talk to one of the large C/C++ projects I mentioned in an earlier post instead of procrastinating by watching talks :(

I learned a lot from these talks by Paul Ross:

Here be Dragons - Writing Safe C Extensions

He talks about various gotchas we should keep in mind while using Python's C API. There's three types of references when we interact with PyObjects:

New references

When we create a new PyObject, it's our job to deallocate it by decrementing its reference count!


  static PyObject *subtract_long(long a, long b) {
      PyObject *pA, *pB, *r;

      pA = PyLong_FromLong(a);  // New ref
      pB = PyLong_FromLong(b);  // New ref
      r = PyNumber_Subtract(pA, pB);  // New ref

      Py_DECREF(pA);  // You must decref
      Py_DECREF(pB);  // You must decref

      return r;  // Caller must decref
  }

He shows the function above which subtracts two long integers. And talks about using Py_DECREF to decrement reference counts for the new PyObjects (pA and pB) we create in the function, after we've used them.


  static PyObject *subtract_long(long a, long b) {
      return PyNumber_Subtract(
          PyLong_FromLong(a),  // a leak
          PyLong_FromLong(b)  // another leak
      );
  }

We should not try to be "Pythonic" and do something like above (without the appropriate Py_DECREF calls) because that's how we get memory leaks!

Stolen references

When we create a new PyObject, but it gets stolen, it's the "thief"'s job to deallocate it.


  static PyObject *subtract_long(long a, long b) {
      PyObject *r, *v;

      r = PyTuple_New(3);  // New ref

      v = PyLong_FromLong(1L);  // New ref
      PyTuple_SetItem(r, 0, v);

      v = PyLong_FromLong(2L);  // New ref
      PyTuple_SetItem(r, 1, v);

      // More common pattern
      PyTuple_SetItem(r, 2, PyLong_FromLong(3L));

      return r;  // Callers must decref
  }

In the example above, we create a new PyObject called v which gets stolen by r (a tuple) when we assign v as its first element. In this case, we don't need to use Py_DECREF to decrement v's reference count.


  PyObject *r, *v;

  r = PyTuple_New(3);  // New ref

  v = PyLong_FromLong(1L);  // New ref
  PyTuple_SetItem(r, 0, v);  // r steals v

  Py_DECREF(v);  // NO! v belongs to r

This is a big no-no! And will result in bugs because Python's garbage collector might deallocate v before r has had a chance to process it!

Borrowed references

When we borrow a reference to a PyObject, the real owner can deallocate it at any time, unless we prevent them by registering our interest!


  static PyObject *borrow_bad(PyObject *pList) {
      PyObject *pFirst;
      pFirst = PyList_GetItem(pList, 0);

      function(pList);  // Dragons ahoy!
      PyObject_Print(pFirst, stdout, 0);

      Py_RETURN_NONE;
  }

In the above example, we create a new PyObject called pFirst, and make it point to the first element of pList. But what happens when we pass pList into a function that removes the first element of the list? What is pFirst going to point to?

Paul Ross shows that this function actually causes a seg fault when we try to call it from the Python REPL!


  static PyObject *borrow_bad(PyObject *pList) {
      PyObject *pFirst;
      pFirst = PyList_GetItem(pList, 0);

      Py_INCREF(pFirst);  // Register your interest

      function(pList);
      PyObject_Print(pFirst, stdout, 0);

      Py_DECREF(pFirst);  // Let go

      pFirst = NULL;

      Py_RETURN_NONE;
  }

The fix is to increase the reference count for pFirst when we create it, decrease it after we've used it, and then set it to NULL so we don't accidentally use it later (if there's a lot of other code below PyObject_Print).

A pattern for reliable C

He also describes a pattern to write reliable C when we're interacting with Python's C API, where he shows how we can write Pythonic C (similar to Python's try-except-finally!) using goto statements:


  static PyObject *function(PyObject *arg1) {
      PyObject *ret = NULL;

      goto try;
  try:
     // Do stuff here
     // On error, "goto except;"
     goto finally;
  except:
      // Handle exception
  finally:
      // Finish
      return ret;
  }

We should also:

He has written about this in a lot more detail at Coding Patterns for Python Extensions.

A faster Python? You Have These Choices

This talk is a survey of various tools that can help us write Python C extensions. Paul Ross shows some tradeoffs that we should keep in mind while choosing a tool. And he also mentions that we should structure our code like this so that it's easier to test the Python and C code separately, because testing glue code can be cumbersome.


      Python code - - > glue.c - - > C/C++ code
           ^                             ^
           |                             |
           |                             |
      Python tests                   C/C++ tests