Finalize, Sweep and Rooting - Understanding Rust GC

Finalize Trait

Previously we explored Trace in Trace - Understanding Rust GC - higepon blog. Let's look into Finalize trait. This is really simple. Every GC-ed object should implement this finalize.

pub trait Finalize {
    fn finalize(&self) {}
}

Let's look at finalize implementation for array and Option<T>. They are empty. Hmm. I'm not quite sure why. Let me start from collect_garbage and come back here later.

;; Array
impl<T: Trace, const N: usize> Finalize for [T; N] {}

;; Option<T>
impl<T: Trace> Finalize for Option<T> {}

collect_garbage

In collect_garbage after the mark returns unmarked objects (= not used objects) and it will call finalize_glue() for each GC<T> object.

        let unmarked = mark(&mut st.boxes_start);
        if unmarked.is_empty() {
            return;
        }
        for node in &unmarked {
            Trace::finalize_glue(&(*node.this.as_ptr()).data);
        }

finalize_glue calls finalize in the Trait. So say an array is unmarked, the gc eventually calls empty trace of the array. Now I remember that finalize give the object a chance to any necessary work before it's deallocated. For example an object may want to close a file associated with it.

Sweep

Now we know that the gc collects unmarked objects and calls finalize for the unmarked objects. It's time to sweep them.

    unsafe fn sweep(finalized: Vec<Unmarked>, bytes_allocated: &mut usize) {
        let _guard = DropGuard::new();
        for node in finalized.into_iter().rev() {
            if (*node.this.as_ptr()).header.marked.get() {
                continue;
            }
            let incoming = node.incoming;
            let mut node = Box::from_raw(node.this.as_ptr());
            *bytes_allocated -= mem::size_of_val::<GcBox<_>>(&*node);
            *incoming = node.header.next.take();
        }
    }

This is actually very interesting. It recreates Box from returning raw pointer using from_raw which makes sense!

Rooting

Let's get back to root and unroot. A Tour of Safe Tracing GC Designs in Rust - In Pursuit of Laziness has a good summary and examples of what is rooting and why we need it.

In one word: Rooting. In a garbage collector, the objects “directly” in use on the stack are the “roots”, and you need to be able to identify them.

struct Foo {
    bar: Option<Gc<Bar>>
}
// this is a root
let bar = Gc::new(Bar::new());
// this is also a root
let foo = Gc::new(Foo::new());
// bar should no longer be a root (but we can't detect that!)
foo.bar = Some(bar);
// but foo should still be a root here since it's not inside
// another GC'd object
let v = vec![foo];

To track root objects, the GC maintains root counts in GC_BOX. In short GC_BOX with root count > 0 is a root object. Rooting - Designing a GC in Rust explains it very well. Note that the count is incremented or decremented when GC<T> object is moved.

Summary

  • Finalize gives an object an opportunity to do some clean up work before it's free-ed.
  • Sweep returns unmarked objects using Box.
  • To track objects directly use in stack the gc maintains root count.