Sam Sartor
Git( Hub| Lab)
# 2022-3-4
Hey! This page is a rough draft.

Self referential data structures have been a long time wart of the Rust programming language. The Pin type works well enough to support async/await, but doesn’t make it any easier to manually create such structures safely.

Self-References Today

Despite what I said above, Rust today does actually support safe self-referential structs in a limited capacity. Go to play.rust-lang.org right now and try running:

#[derive(Debug)]
struct OwnAndRef<'this> {
    owned: i32,
    refed: &'this i32,
}

// allocate some space on the stack
let mut owner = OwnAndRef {
    owned: 42,
    refed: &-1,
};

// and now borrow your own field!
owner.refed = &owner.owned;

println!("{:?}", owner);

This code is totally absent of Pin<&mut T> schenenagins but the Rust compiler is aware that owner cannot be moved elsewhere on the stack.

owner.refed = &owner.owned;
              ------------ borrow of `owner.owned` occurs here

let new_owner = owner;
                ^^^^^
                |
                move out of `owner` occurs here
                borrow later used here

How about we take this seemingly overlooked ability out for a spin? Below is an example of a simple* arena-allocated tree structure. I admit it a bit artificial, since we can do the same thing with indices into a Vec (or even better, with generational_arena). But we have to start somewhere.

#[derive(Copy, Clone)]
enum Node<'this> {
    Branch(NodeRef<'this>, NodeRef<'this>),
    Leaf(i32),
}
use Node::*;

type NodeRef<'this> = &'this Cell<Node<'this>>;

struct Tree<'this> {
    arena: Arena<Cell<Node<'this>>>,
    root: NodeRef<'this>,
}

impl<'this> Tree<'this> {
    pub fn append(&self, num: i32) {
        let mut cell = self.root;
        while let Branch(_, right) = cell.get() {
            cell = right;
        }

        let old = self.arena.alloc(Cell::new(cell.get()));
        let new = self.arena.alloc(Cell::new(Leaf(num)));
        cell.set(Branch(old, new));
    }

    pub fn rotate(&self) {
        let top = self.root;
        let Branch(a, mid) = top.get() else { return };
        let Branch(b, c) = mid.get() else { return };
        mid.set(Branch(a, b));
        top.set(Branch(mid, c));
    }
}

Looks pretty good, except there’s a problem. How do we construct an empty tree object? The tree root has to be initialized with a reference to the arena, thus making the arena immovable. But the arena also has to be moved into the new struct:h

let arena = Arena::new();
let root = arena.alloc(Cell::new(Leaf));
           ---------------------------- borrow of `arena` occurs here
let mut tree = Tree { arena, root };
                      ^^^^^  ---- borrow later used here
                      |
                      move out of `arena` occurs here

Reference Transplanting

Forget about self-references for a minute and consider the following code:

let a = Box::new((42, 69));
let r = &a.0;
let b = a;
dbg!(r);

In Rust today, this is totally illegal. We obviously deserve to be chastised with “cannot move out of a because it is borrowed” for attempting such blasphemy. But does this code really need to throw an error? I’d argue that it doesn’t, since the pointer underlying r is at no risk of invalidation. The memory r points to is still intact, just bound to a differently named variable.

If our goal is a simpler feeling Rust where “more of the programs that should work actually do work”, why not ask for this to work as well?

If you haven’t read “An alias-based formulation of the borrow checker”, I would recommend skimming it now, because the terminology will be helpful. In the example above, the the origin of reference r contains a shared loan of the box a.deref().0. “Origin” is the name used by polonius instead of “lifetime”, since it refers to some set of loaned variables and not a region of the program. Still, I’ll notate the origin of r as a lifetime parameter '1.

The move out of binding a and into a new binding b on line 3 then invalidates the loan of a inside origin '1. This results in an error because the origin is still alive: it is used by dbg!(r) on line 4.

To accept this program, we’d need polonius to “transplant” the loan from the path a.deref().0 to the path b.deref().0 at line 3. The exact details need to be worked out in far greater detail, but I believe any path <prefix_a>.deref().<postfix> can be transplanted to <prefix_b>.deref().<postfix> so long as the point where the transplant occurs dominates all points with live origins containing the path, and the deref implementation in question is known to return a stable memory addresses. Box::deref is already treated magically as part of loan paths so it is easy to support, but some kind of unsafe trait StableDeref {} would be needed to generalize to other smart pointer types.

With reference transplanting in place, the compiler will also allow us to move self-referential boxes:

let mut a = Box::new((42, &0));
a.1 = &a.0;
let b = a;
dbg!(&b);

In this case, the variable a initially contains a reference with empty origin, as the borrow &0 has static lifetime. In a sense, the static lifetime can be defined as the empty set of loans. That origin is replaced with once containing a shared loan of a.deref().0 on line 2, and the origin is kept alive until after the move on line 3.

Now for the important bit: that move is legal because Box implements StableDeref and the loan can be legally transplanted from path a.deref().0 to path b.deref().0. In all other respects, the loan is totally unchanged. Prior to the move, the loan is seen by the compiler as having the first path. After the move, it is simply seen as having the later path.

Variable b is then given a new origin containing the same (now-transplanted) loan. The old origin is dead, since it is never again used, while the new origin is kept alive by the dbg! macro on line 4. This logic is totally unaware of the transplant, and follows the same rules as any assignment operation.

With that feature out of the way, we are now able to construct an instance of our Tree struct:

let arena = Arena::new();
let root = arena.alloc(Cell::new(Leaf(42)));
// Transplant the loan inside root from &arena to &tree.arena
let mut tree = Tree { arena, root };

This requires one key modification to the arena type, so that it can be moved freely after allocating the root. Calling arena.alloc(...) method should force a dereference onto some kind of ArenaInterior which has a stable memory address:

pub struct Arena<T> {
    interior: Box<ArenaInterior>,
}

impl Deref for Arena<T> {
    type Target = ArenaInterior<T>;

    fn deref(&self) -> &ArenaInterior<T> { &*self.interior }
}

unsafe impl StableDeref for Arena {}

pub struct ArenaInterior<T> { .. }

impl ArenaInterior<T> {
    pub fn alloc(&self, value: T) -> &mut T { .. }
}

Transplanting and Functions

It would be nice to have a Tree::new() function. But for that to work, we need to be capable of transplanting references into and out of functions.

This should be possible because function callees and callers can share the variables that argument values are bound to, as well a kind of pretend variable which binds the return value. A caller can transplant any passed references onto new arguments before jumping into the callee. And a callee can transplant any returned references onto the pretend return variable before popping the stack.

This all works until the callee starts mucking with loaned fields, resulting in nasty unsoundness:

let mut a = Box::new((42, &0));
a.1 = &b.0;
take(a); // transplanted the reference from `a.deref().0` to `b.deref().0`

fn take(b: Box<(i32, &i32)>) {
    b.0 += 1; // illegal!
    dbg!(b.1);
}

It is tempting to think that the view types proposal could be our savior in this case. The function can simply agree not to touch b.deref().0 and everything will be fine, right? Not quite. We can still touch b.deref().0 by dropping b. Whatever fields we agree not to touch individually, dropping is always an option.

fn take(b: Box<{1} (i32, &i32)>) {
    let c = b.1;
    drop(b); // illegal!
    dbg!(c);
}

So a transplant into a function can only be legal if the function is also aware of the terms of transplanted loan: that it is dependent on some other argument not being messed with. Our take function must actually declare the source of passed reference. For now we’ll do this with a where clause explicitly documenting that that the origin 'r contains a loan from path b.0.

fn take<'r>(b: Box<(i32, &'r i32)>)
    where &b.0 in 'r
{
    let c = b.1;
    drop(b); // ERROR: c is known to borrow from b
    dbg!(c);
}

Explicit Origins

With this bikeshed-avoidance syntax for “explicit origins” in place, we can get our Tree data structure will working perfectly! In fact, they allow us to ditch all the interior mutability and complicated Cell juggling. Now nodes can store non-copy data. Even better, we get type-level verification that each node has a a most a single parent (the holder of the exclusive reference).

#[derive(Clone)]
enum Node<'mem> {
    Branch(NodeMut<'mem>, NodeMut<'mem>),
    Leaf(String),
}
use Node::*;

type NodeMut<'mem> = &'mem mut Node<'mem>;

struct Tree<'mem> {
    arena: Arena<Node<'mem>>,
    root: NodeMut<'mem>,
}

impl<'mem> Tree<'mem> {
    pub fn new(root: String) -> Self
        where &*return.arena in 'mem
    {
        let arena = Arena::new();
        let root = arena.alloc(Leaf(root));
        Tree { arena, root }
    }

    pub fn append(&mut self, text: String)
        where &*self.arena in 'mem
    {
        let mut edge = self.root;
        while let Branch(_, right) = edge {
            edge = right;
        }

        let old = self.arena.alloc(edge.clone());
        let new = self.arena.alloc(Leaf(num));
        *edge = Branch(old, new);
    }

    pub fn rotate(&mut self)
        where &*self.arena in 'mem
    {
        let Branch(ref mut a, ref mut mid) = self else { return };
        let Branch(ref mut b, ref mut c) = mid else { return };
        swap(a, c);
        swap(a, mid);
        swap(b, c);
    }
}

I’m using return as an identifier for “whatever variable the returned object is moved to”. We can work on making it less confusing later.

Notice that the elided lifetime on &mut self (call it 'a) is only related to 'mem by an implied bound 'mem: 'a. This “outlives” relationship should be fairly obvious, since we could later invalidate 'a by moving the tree object, yet the references with origin 'mem would be transplanted and so continue to live on. In the words of the aliasing formulation, we would say that the origin 'a includes the &self.arena loan also in the origin 'mem, but that it may depend on many other loans besides.

To get a more idea of how self-references would work with explicit origins, let’s work on another example. The owning_ref crate is good inspiration here because it is a primative sort of self-reference that we should obviously be able to handle, but is also hard to write soundly.

pub struct OwningMut<'r, O, T> {
    owned: Box<O>,
    reference: &'r mut T,
}

impl<'a, T> OwningMut<'a, T, T> {
    pub fn new(value: T) -> Self
    where
        &mut *return.owned in 'a
    {
        let owned = Box::new(value);

        // creates origin '0 requiring exclusive loan L0 with path `owned.deref_mut()`
        // inputs fact '0: '1
        let reference: &mut '1 T = &mut '0 *owned;

        let new = OwningRef::<'2, _, _> {
            // transplant L0 from `owned.deref_mut()` to `new.deref_mut()`
            owned: owned,
            // inputs fact '1: '2
            reference: reference,
        };

        // transplant L0 from `new.deref_mut()` to the caller (a.k.a. `return.deref_mut()`)
        // inputs fact 'a: '2
        // -> L0 escapes into placeholder origin 'a
        // -> OK because return.deref_mut() was declared as required by 'a
        new
    }
}

impl<'a, O, T> OwningRef<O, T> {
    pub fn map<'b, V>(self, f: impl FnOnce(&'a, T) -> &'b V) -> OwningRef<'b, O, V>
    where
        'b: 'a,
        &mut *self.owned in 'a,
        &mut *return.owned in 'b,
    {
        // create placeholder loan L0 with path `self.owned.deref_mut()`
        // inputs fact that L0 is required by 'a

        let new = OwningRef<'0, _, _> {
            // transplant L0 from `self.owned.deref_mut()` to `new.owned.deref_mut()`
            owned: self.owned,
            // inputs fact '0: 'a
            reference: f(self.reference),
        };

        // transplant L0 from `new.owned.deref_mut()` to `return.owned.deref_mut()`
        // inputs fact 'b: '0
        // -> 'b: 'a is OK because it was declared
        // -> L0 in 'b is OK because it was declared
        new
    }

    pub fn as_owner_mut(&mut self) -> &mut O {
        // Ok because we have exclusive access to self
        &mut *self.owned
    }

    fn into_owner(self) -> Box<T>
        &mut *self.owned in 'a,
    {
        // create placeholder loan L0 with path `self.owned.deref_mut()`
        // inputs fact that L0 is required by 'a

        // invalidates L0
        // -> is OK because 'a is unused
        self.owner
    }
}

Notice that the as_owner_mut function above would be unsound except for the protections afforded by explicit origins. In fact, it is completely impossible to call because it fails to declare &mut *self.owned in 'a:

ERROR: cannot borrow `int` as mutable more than once at a time

let mut int = OwningMut::new(42);
        ___   ______________
        |     |
        |     first mutable borrow declared here
        `int.owned` flows into `int.reference`
*int.as_owner_mut() += 1;
 ^^^^^^^^^^^^^^^^^^ second mutable borrow occurs here

All instances of OwningMut are known to have a loan of self.owned live in self.reference, and so any function interacting with an OwningMut must have a corresponding declaration to be useful. If we try to add the declaration, we get then get an error inside the method instead:

ERROR: cannot borrow `self.owned` as mutable more than once at a time

pub fn as_owner_mut(&mut self) -> &mut O
where
    &mut *self.owned in 'a
    ______________________
    |
    first mutable borrow declared here
{
    &mut *self.owned
    ^^^^^^^^^^^^^^^^ second mutable borrow occurs here
}

Explicit origins also work for self-references on the stack, although the API for a sort of stack-allocated owning ref is not very pretty. We need to make sure it is movable until it reaches its final location on the stack, requiring some generator-esque statefulness.

struct StackOwningMut<'a, T> {
    owned: T,
    reference: Option<&'a mut T>,
}

impl<T> StackOwningMut<'static, T> {
    fn new(value: T) -> Self {
        Self {
            owned: value,
            reference: None,
        }
    }
}

impl<'a, T> StackOwningMut<'a, T> {
    fn init(&mut self) where &mut self.owned in 'a {
        self.reference = Some(&mut self.owned);
    }

    fn unwrap(self) -> T {
        self.owner
    }
}

// inputs fact '0: 'static
let a: StackOwningMut<'0, i32> = StackOwningMut::new(42);

// inputs fact '0: '1
// OK because there are no loans in '0 to invalidate
let mut b: StackOwningMut<'1, i32> = a;

// create exclusive loan L0 with path `a.owned`
// input fact L0 in '1
b.init();

// Use the reference!
dbg!(b.as_mut().unwrap());

// Kill '1
b.reference = None;

// OK because loan L0 is unused by any live origins
let mut c: StackOwningMut<'2, i32> = b;

In a world of reference transplanting and explicit origins, pinning is temporary! If we provably kill all the self-references, then no live loans will be invalidated by moving our objects. This is in contrast to the unsafe Pin API. Once it is used, it requires such objects to be permanently pinned in place.

Generics

So we have a working system for self-references completely free of Pin, right? Before we declare victory, lets see what happens when a generic function like mem::swap gets involved:

let mut a = StackOwningMut::new(42);
a.init();
let mut b = StackOwningMut::new(43);
b.init();

// ERROR: can not borrow a.owner as mutable more than once at a time
mem::swap(&mut a, &mut b);

drop(a); // make b.reference dangling
*b.reference.unwrap(); // Trigger UB!

Thankfully, are forbidden from passing a self-reference to any function which does not acknowledge it. As mem::sway does not explicitly declare any origins, all such skulduggery is safely illegal.

But this safety comes at a cost. No generic function or trait method can declare an explicit origin involving the fields of the generic type, since the fields of that type are totally unknown. So if for example, we want StackOwningMut to implement the Deref trait, we are stuck.

ERROR: can not borrow `a.owner` as immutable when it is borrowed as mutable

fn num_bytes(a: &StackOwningMut<String>) -> usize {
    a.len()
    __
    |
    `<StackOwningMut as Deref>::deref` does not declare the origin of `a.reference`
}

There is no way for impl Deref for StackOwningMut to add the origin declarations, since the Deref trait does not declare them itself.

Thankfully, it is possible to get around this with a wrapper type and a bit of unsafe code. Since the wrapper type will hide the internals of StackOwningMut and protect them from screw ups, it does not need any explicit declarations of its own. But any code which knows the concrete types involved can use the unwrap_ref method to peek at the real type complete with all self-references:

struct StackOwningMutWrapper<P> {
    pointer: P
}

impl<'a, T, P> StackOwningMutWrapper<P>
where
    P: Deref<Target=StackOwningMut<'a, T>> + DerefMut,
{
    fn wrap(pointer: P) -> Self
    where
        &mut pointer.owned in 'a
    { ... }

    fn unwrap_ref(&self) -> &StackOwningMut<'a, T>
    where
        &mut return.owned in 'a
    { ... }
}

impl<'a, T, P> Deref for StackOwningMutWrapper<P>
where
    P: Deref<Target=StackOwningMut<'a, T>> + DerefMut,
{
    type Target = T;

    fn deref(&self) -> &T {
        self.unwrap_ref().reference
    }
}

A similar wrapper is also needed for our heap-allocating tree type, if we want it to implement a trait.

pub trait Treelike<T> {
    fn new(root: T) -> Self;
    fn append(&mut self, value: T);
    fn rotate(&mut self);
}

pub struct TreeWrapper {
    inner: Tree<'unsafe>,
}

impl TreeWrapper {
    pub fn wrap<'mem>(inner: Tree<'mem>)
        where &mut inner.arena in 'mem
    { ... }

    pub fn unwrap_mut<'mem>(&mut self) -> &mut Tree<'mem>
        where &mut return.arena in 'r
    { ... }
}

impl Treelike<String> for TreeWrapper {
    fn new(root: String) -> Self {
        TreeWrapper::wrap(Tree:new(root))
    }

    pub fn append(&mut self, value: String) {
        self.unwrap_mut().append(value);
    }

    pub fn rotate(&mut self) {
        self.unwrap_mut().rotate(value);
    }
}

The primary role of either wrapper is to hide some explicit origin, so that the self-referential type can be passed to functions which do not declare any such origin.

To view or not to view?

Ergonomics

Conclusion

TODO