Sam Sartor
# Statefulness in GUIs 2023-2-6

Around 4 years ago while working on SketchUp Web (an interesting mix of legacy C++ and shiny VueJS) I got the idea to try making a Vue/MobX-style reactivity system in Rust. I wasn’t so concerned with rendering or layout. All I wanted was a way to mutate random stuff in closures, and then have everything else update automatically. That turned out to be hard.

This the first post in a series, starting with my observations on GUI stuff and hopefully ending with a complete introduction to my unnamed reactivity crate. Stay tuned!

-> Part 1 | Part 2 | Part 3 | …


The way I see it, all software applications fall on a spectrum, from those that are very stateful to those that are completely stateless. Compilers and command-line utilities are traditionally stateless. So are most school assignments! Statelessness is usually described as a property of code, often code that is side-effect-free or purely functional. But it also has something to do with the type task being performed.

Stateless code is a good fit for tasks which allow all input to be received before requiring any output to be produced. Stateful code is required to produce output before all the inputs are known.

This is more directly the distinction between batch processing and stream processing. But the code examples below are the most clear if you think about “state” being whatever stuff crosses from one input-output cycle to the next. In the simplest cases, stateless Rust applications with only one input-output cycle have this sort of structure:

fn main() {
    let input = read(...);
    let output = work(&input);
    write(..., output);
}

The program receives a complete input, loans it to some functions that produce output, and then finishes the task. Rust is very good at this! Although the input needs to be shared throughout the application, it can be made immutable. Output often needs gradual assembly via mutation, but it can also be owned by the functions doing the assembly.

On the other hand, stateful applications can be more difficult. Our example is a basic GUI application consisting of a list of counters. The user should be able to increment counters, decrement counters, add new counters, and remove existing counters (optional). Such a GUI is still pretty simple, but I ask you to keep in mind other stateful applications like industrial control systems, operating systems, embedded devices, and video games.

42 12 7

This application has to produce output (to the screen) before all the inputs are available (from the mouse, keyboard, or pedals). That is what forces it to maintain and update the obvious “model state”:

[[counter]]
value = 42
color = "#6fa8dc"

[[counter]]
value = 12
color = "#f6b26b"

[[counter]]
value = 7
color = "#93c47"

To some extent the actual elements of the GUI are also stateful. The GUI framework needs to keep track of location of each button, the position of the cursor in a text box, the presence of event listeners on each, and so on. Although the application may or may not care, this sort of data must exist somewhere or the renderer/window manager/input subsystem could not determine the colors of pixels or routing of events. From this point onward, all the miscellaneous information that is internal to elements of the GUI will be called the “view state”, but know that the model/view dichotomy is more useful conceptualization than hard technical fact.

However we actually organize the state of our GUI, we know each counter will require a number to be in memory somewhere. One side of the application will access the number in order to drive all kinds of font rendering and GPU computing and whatnot, so that it eventually appears as little lights on the screen. Over on the other side of the application there will need to be some sort of pointer wizardry, such that the correct location in memory gets incremented when a circuit inside the mouse is closed. No matter the exact details, those two sides of application have to share. The whole problem of stateful applications is limiting the amount of pain caused by such shared mutability.

42 12 7 [[counter]]value = 42color = "#6fa8dc"

Immediate-mode

The simplest stateful applications have this fundamental sort of structure, where the state is recreated from scratch on every input-output cycle:

fn main() {
    let mut state = empty();
    loop {
        let event = read(...);
        state = work(&event, state);
        write(&state);
    }
}

I’m going to abuse graphics terminology a bit and refer to the above pattern as “immediate mode”. Historically that term is used to describe graphics APIs which require users to recreate the entire command buffer on every frame, but the same concept should apply even when no pixels are present.

Immediate-mode still shares the state between the input and output sides of the application. But by completely separating the new state from all previous states, we make the application seem almost stateless. The event and previous state are a complete input, while the new state is a complete output.

Immediate-mode GUIs also maintain separation between view state and model state. Without live references from the model to the view, the view can be trivially thrown out recreated on each frame. The imgui, egui, and Elm frameworks all operate in this way.

I hear you cry out “idiot! Since when is Elm immediate-mode?” But structurally speaking, Elm applications look like exactly like immediate-mode applications. Every Elm GUI consists of a single main loop which receives each event as a complete input and provides a whole new view as a complete output. The only difference is Elm’s power of memoization, available to all high born purely functional programing languages. In Elm, there is no such thing as a component. That is why I resist comparing Elm to Rust frameworks like Relm and Iced. They need to use components in place of trivial memoization and garbage collection, so they get their own section of this post.

Getting back to Rust, an immediate-mode implementation of our counter application could look something like this:

fn render(event: &Event, old_model: Vec<Counter>) -> (Vec<Counter>, Vec<Widget>) {
    // Init the new state.
    let mut new_view = Vec::new();
    let mut new_model = Vec::new();

    for Counter { mut value, color } in old_model {
        // Add the current counter text to the view state.
        new_view.push(Text::new(value));

        // Add a + button to the state and check if would have
        // been pressed by the last click event.
        let add = Button::new("+");
        if add.is_clicked_by(event) {
            value += 1;
        }
        new_view.push(add);

        // Add a - button to the state and check if would have
        // been pressed by the last click event.
        let sub = Button::new("-");
        if sub.is_clicked_by(event) {
            value -= 1;
        }
        new_view.push(sub);
        new_model.push(Counter { value, color });
    }

    // The button to create a new counter.
    let add = Button::new("+");
    if add.is_clicked_by(event) {
        new_model.push(Counter {
            value: 0,
            color: new_random_color(),
        });
    }
    new_view.push(add);

    // Return the updated state.
    (new_model, new_view)
}

Again, this looks a lot like an Elm application where the update and view functions got squashed together. But it isn’t fundamentally different from more mutation-happy frameworks like imgui and egui. They simply bundle the last event and the new view state together into a single mutable object, in order to reduce boilerplate:

fn render(ui: &mut Ui, model: &mut Vec<Counter>) {
    for Counter { value, color } in model {
        ui.add(Text::new(value).with_color(color));
        if ui.add(Button::new("+")).clicked() {
            *value += 1;
        }
        if ui.add(Button::new("-")).clicked() {
            *value -= 1;
        }
    }
    if ui.add(Button::new("+")).clicked() {
        model.push(Counter {
            value: 0,
            color: new_random_color(),
        });
    }
}

Also notice that we are not throwing out the entire model state on each frame, and are instead providing a mutable reference. But because mutable references are exclusive, it is equivalent to moving ownership of the state into the render function and then swapping it back to the GUI framework on return.

Retained-mode

There are 3 sorts of limitations faced by purely immediate-mode GUIs which I can easily identify:

  1. Performance - recreating the whole view state every iteration can require a lot of computation, even if the actual changes are very small. Increment 1 counter out of 1000 and often font shaping runs for all of them.
  2. Structure - the programmer is responsible for maintaining all the relationships between objects in the model state and corresponding elements in the view state. When a “+” is clicked, how do we know which counter to increment?
  3. Statefulness - sometimes view state is unavoidable. Input elements should receive or loose focus, maintain text selections, remember if they are expanded or collapsed. Where do you put that information, especially if you are authoring custom GUI components?

In search of solutions to these problems, consider the far opposite corner of the design space: retained mode. In retained mode changes to state are made at-will, with the overall state being “retained” from one update to the next.

This is really hard to show in Rust because all data is simultaneously shared and mutable, but it is very common in traditional garbage-collected OOP languages. Each object receives events and updates independently, while holding arbitrary references to other objects. I think imperative languages such as C also lend themselves to retained mode, just with more explicit walking through pointers. Certainly the Linux kernel should be considered retained-mode more so than immediate-mode.

Classic GUI-ish examples are GTK, QT, and the HTML DOM. For each element of a retained-mode GUI, the framework retains a specific object across frames. This is very flexible and has good performance! But as an unfortunate consequence, the programmer has to track down and mutate the properties of that object every time the GUI needs change. This sucks. Old-school JQuery makes retained-mode just bearable for very very simple GUIs like our counter list:

<script>
    counts = [0, 0, 0];

    function add(index, offset) {
        counts[index] += offset;
        $(`#value${index}`).html(counts[index]);
    }

    function newCounter() {
        const i = counts.length;
        counts[i] = 0;
        $('#counters').append(`
            <div id="value${i}">0</div>
            <button onclick="add(${i}, 1)">+</button>
            <button onclick="add(${i}, -1)">-</button>
        `);
    }
</script>

<div id="counters">
    <div id="value0">0</div>
    <button onclick="add(0, 1)">+</button>
    <button onclick="add(0, -1)">-</button>
    <div id="value1">0</div>
    <button onclick="add(1, 1)">+</button>
    <button onclick="add(1, -1)">-</button>
    <div id="value2">0</div>
    <button onclick="add(2, 1)">+</button>
    <button onclick="add(2, -1)">-</button>
</div>
<button onclick="newCounter()">+</button>

But as the GUI scales up it becomes impossible to keep track of the spaghetti pile of inter-object relationships. And since Rust is fundamentally opposed to any code that needs to arbitrarily mutate objects owned by someone else, even this simple example looks incredibly bad when translated.

Still there are advantages. A retained-mode GUI does not fundamentally separate view state and model state. Although it is usually a good practice to have obvious view classes and model classes, the ability to reference view from model and vice-versa is what helps to combat the three limitations of immediate-mode. Performance is better because each part of the model can link to and reuse its corresponding view. Structure is maintained by ordinary object reference. And UI element classes are free to retain whatever state they need to offer the best UX.

Hybrid-mode

Givin the ergonomic advantages of immediate-mode and flexibility of retained-mode, most modern GUI frameworks use a widget tree to offer a hybrid of both. I’m certain this category of framework already has multiple names, but we are going to call it “hybrid mode”. It includes React, Relm, Dioxus, Iced, Yew, KAS, and many more. Each asks the programmer to break their interface into many retained-state widgets, where each widget acts like an immediate-mode GUI in miniature! When the model state of a particular widget changes, it recreates the view state in entirety. But the resulting view state gets diffed to identify child widgets, so that their retained view and model state can automatically drop into place. Reusing state obviously improves performance, but the real win over immediate-mode is the ability to embed model state inside view state. In some sense, every widget’s model is part of some parent widgets’s view.

In some simplified universe, a hybrid-mode counter app might look like:

#[widget]
fn counter() -> Element {
    let mut color = use_state(new_random_color);
    let mut count = use_state(|| 0);
    view! {
        format!("{count}"),
        style: format!("border: 2px solid {color};"),
        button { text: "+", onclick: count += 1 },
        button { text: "-", onclick: count -= 1 },
    }
}

#[widget]
fn app() -> Element {
    let mut counter_ids = use_state(Vec::new);
    view! {
        for &id in counter_ids {
            counter { key: id }
        },
        button { text: "+", onclick: counter_ids.push(Uuid::new_v4()) },
    }
}

At a very high level, widget trees in Rust still move ownership of the state back and forth between the input and output sides of the application on every update, although the details are interesting. First the programmer’s own code gets complete access to the state of each widget, and makes whatever updates they like. This has the power to invalidate any reference into the state, so a big map is used to diff widgets and reconnect them with their children afterwards. Then exclusive access to the state is passed to the layout and painting subsystems in order to actually get stuff out onto the screen. Once a new event comes in, ownership returns to the input subsystem and the cycle repeats.

Wait for Event Layout and Paint Recover References Update State Pass to Output Pass to Input Owns the State Owns the State

This is pretty clever! And we can theoretically upgrade it with additional stages in order to better share the state across different subsystems. For example, we could add a step to the cycle where each object in the model is diffed and serialized into a stack, and then regurgitated when the user presses Ctrl+Z.

Wait for Event Layout and Paint Pass to Output Pass to Undo/Redo Owns the State Owns the State Owns the State Recover References Update State Pass to Input Time Travel

If you take this to a the logical extreme, I think you’ll kinda get a diffed version Bevy’s entity component system. The programmer’s own code could create bits of state and deposit them with some kind of scheduler, which would then provide different systems with mutable or immutable access as needed to update or react to units of state. During my undergraduate I worked on an immediate-mode GUI system for virtual reality, and wound up needing exactly this in order run simulations and resolve hit tests. It worked great for a prototype, but I suspect that as more and more systems have opportunities to modify state, it will require more and more custom bookkeeping to recover references from one update to the next. If you prove me wrong, please let me know!

In my mind, this bookkeeping is really where we can make improvements. Plenty of people are working on laying out and painting widget trees. But however you move state around in those trees, you need ways to know when it is modified, to maintain references in the face of that modification, and to schedule modifications in such a way that they do not conflict.

Join me in in part 2 to watch as the tree structure of hybrid mode starts to breaks down!