Day 54 — Rust ownership and Python garbage collection
25 October 2020 · recurse-center TweetToday I read the chapter on ownership in the Rust book. In this post, I'll try to summarize what I learned for future me!
Ownership in Rust
Ownership is Rust's central feature and the chapter explains how it allows Rust to make memory safety guarantees without needing a garbage collector.
In some languages like Python, memory is managed through a garbage collector which constantly looks for variables that can be dropped to free up memory. In other languages like C, memory is managed by the programmer with the malloc
and free
functions. In Rust, memory is managed through a set of ownership rules that the compiler checks at compile time, which looks like a fine balance between a garbage collector (which "sounds" external to the code actually being run), and manual memory management done by the programmer (which can be error prone).
Data types
To explain ownership, the chapter goes into scalar and non-scalar data types and how they are stored in memory. The main difference between the two is that the size of a scalar data type is known at compile time.
Scalar data types include all the integer and floating point types, such as u32
and f64
; the character type char
, the boolean type bool
, and tuples (if they are made only out of scalar data types, for example, (i32, i32)
). Non-scalar data types include things that can grow and shrink, like String
s and vectors.
Stack and Heap
Scalar values are stored on the stack, while non-scalar values are stored on the heap after finding a space that is big enough (through allocation). It's easy to manage values on the stack, because we always need to access and make copies at the top. But it's tedious to do both those things in the case of a heap, because we first have to follow the pointer to a memory location to access the value, and then ask the allocator for more space if we want to make copies.
Ownership rules
Managing data on the heap is the reason why ownership exists in Rust. These are Rust's ownership rules:
- Each value in Rust has a variable that's called its owner
- There can only be one owner at a time
- When the owner goes out of scope, the value is dropped
Based on these rules (enforced by the Rust compiler at compile time), Rust can automatically run malloc
and free
on values in memory when a variable comes in and goes out of scope. And that's why we don't need to use a garbage collector or do both malloc
and free
by hand!
Copy and Move
In Rust, when we assign an existing scalar variable to a new variable:
let x = 5;
let y = x;
The value is copied from the old variable to the new one, because it's easy to do that on a stack like we discussed above.
But when we assign an existing non-scalar variable to a new variable:
let x = String::from("Hello");
let y = x;
Rust copies the pointer (which is on the stack) to the new variable, but not the data to which it points to. That's because copying data on the heap could be an expensive operation if the data were large.
But wait! Because of Rust's ownership rules, both x
and y
will now try to free the data they point to when they go out of scope (double free error!).
To prevent that from happening, Rust moves the ownership of data from x
to y
when we do let y = x
, thus making x
an invalid reference. If we try to use x
later, the Rust compiler will throw an error!
References and Borrows
Other ways to move ownership are passing a variable to a function and returning a variable from a function. Since passing and then returning ownership with every function call can be tedious, Rust lets us pass references to variables into functions instead. In this case, the variable is borrowed by the function.
References allow us to refer to variables without taking ownership of them. They are immutable by default and we're not allowed to modify something we have a reference to. References can also be mutable but Rust doesn't let us have more than one mutable reference in a scope to prevent race conditions!
The code below will fail because r1
and r2
are both mutable references to s
:
let mut s = String::from("hello");
let r1 = &mut s;
let r2 = &mut s;
println!("{} {}", r1, r2);
We also cannot have a mutable reference while we have an immutable one, because users of an immutable reference don't expect values to suddenly change by a mutable reference!
And because of that, the code below will also fail:
let mut s = String::from("hello");
let r1 = &s;
let r2 = &mut s;
println!("{} {}", r1, r2);
A reference's scope starts from where it is introduced, and continues through the last time that reference is used.
So this is valid code:
let mut s = String::from("hello");
let r1 = &mut s;
println!("{}", r1);
let r2 = &mut s;
println!("{}", r2);
We can have multiple immutable references to a variable because they don't change the value they refer to:
let mut s = String::from("hello");
let r1 = &s;
let r2 = &s;
println!("{} {}", r1, r2);
Garbage collection in Python
When I was learning about Python C extensions some moons ago, I came across Python's C-API and how it also works with references (counts!). The Python garbage collector drops values when the references pointing to them become 0.
New references
When we create a new reference to a PyObject
, we must call Py_DECREF
on it so that it can be garbage collected. If we fail to call Py_DECREF
, we get a memory leak!
PyObject *pA = PyLong_FromLong(a); // New ref
PyObject *pB = PyLong_FromLong(b); // New ref
PyObject *r = PyNumber_Subtract(pA, pB); // New ref
Py_DECREF(pA); // You must decref
Py_DECREF(pB); // You must decref
return r; // Caller must decref
What if we could remove the need to call Py_DECREF
on pA
and pB
by automatically calling an associated free
function when both of those references go out of scope?
Moved references
When we move a reference to a PyObject
into something (in this case, a tuple), it is kinda owned by that tuple because now it's the tuple's responsibility to call Py_DECREF
on it. If we call Py_DECREF
after moving it to the tuple, that can lead to unintended consequences where the garbage collector might drop the value before the tuple has had a chance to make use of the value.
PyObject *r = PyTuple_New(2); // New ref
PyObject *v1 = PyLong_FromLong(1L); // New ref
PyTuple_SetItem(r, 0, v1);
// We shouldn't Py_DECREF(v1) because it belongs to r now
PyObject *v2 = PyLong_FromLong(2L); // New ref
PyTuple_SetItem(r, 1, v2);
return r; // Callers must decref
What if a compiler could enforce ownership rules and prevent us from calling Py_DECREF
on v1
after it has moved into r
?
Borrowed references
When we borrow a reference to something, we need to explicitly call Py_INCREF
to register our interest, so that the garbage collector doesn't drop the value before we've had a chance to make use of it.
PyObject *pFirst;
pFirst = PyList_GetItem(pList, 0);
Py_INCREF(pFirst); // Register our interest
modify(pList);
PyObject_Print(pFirst, stdout, 0);
Py_DECREF(pFirst); // Let go
What if we didn't have to register our interest for using the reference pFirst
by calling Py_INCREF
? The compiler could instead throw an error if we borrow pFirst
but then try to change pList
in modify
.
Could ownership rules be built into Python's C-API and enforced by a compiler, thus removing reference counting and replacing the need for a garbage collector?