Interactive Interpreter Undo

13 Sep 2016

Undo is my favorite feature of the fancy python shell I help maintain, and I wish more interactive interpreters had it. I’ve previously written about the mechanics of implementing a command line interface that can rewrite its history, but not about how to teach a system to undo. I’d like to tell you why I like undo, and I hope afterward you’ll consider adding undo to your favorite interactive interpreter. I’m not going to cover how to write a program or language implementation from scratch with efficient undo as a design goal, although this is also topic I’m learning about. (If you have reading suggestions on this topic, please let me know!) Instead I’ll go over ways to add undo to existing programs and suggest some hacks you can use to try out undo with many command line interfaces today.


When I mistakenly delete half of a blog post I’m working on, I’m sure glad I can hit undo to bring it back. Although leaning on backspace and retyping the word works well enough for fixing typos, undo allows me to revert arbitrary changes: if the document suddenly looks funny and I’m not sure why, I can probably fix it with undo. I’m free to experiment with functionality I don’t understand, clicking buttons and trying menu items just to see what they do. I can try out changes without worrying about the overhead of reverting them when I find I don’t like them.

Having undo means taking an action is only a first attempt at a change, not an indelible mark. A digital artist friend of mine, Jillian Hazlett, says:

I typically use undo a LOT when I’m doing digital art. I hardly ever get things right on the first try! I typically color a bit, then undo a bit, color a bit, undo a bit more. …

I also “undo” when I make physical art. When I draw in pencil, I tend to erase a lot. When I paint, I wipe away mistakes or add another layer on top of them. But the undo button is kind of a perfect eraser! No smudge marks! It allows for easy experimentation with zero risk of “ruining” the piece you’re working on.

Undo amounts to time travel in software where document evolution is tied to the passage of time. Video game console emulators construct the ability to undo gameplay actions like jumping or shooting out of the ability to save snapshots of the game state dozens of times per second. Tool-assisted speedrunning is the process of constructing an optimal playthrough of a game by playing it until the first mistake, then rewinding to that point and trying again and again. With a video game, the end product is not the final state of the game but the recording of the playthrough; the evolution in time of the document instead of the document itself.

Once walking forwards and backwards in time becomes second nature, it can be useful to consider alternate timelines, a branching structure known as an undo tree, so I can undo a lot, work from there, then change my mind and go back to the document before the undos. For applications without an undo tree, saving different versions of a document can serve a similar purpose. Version control software organizes the process of saving backups over time to deliver the power of undo and the higher level of tracking documents over long periods of time.

Undo frees users to experiment, helps them catch mistakes, and allows them to craft an optimal path from one document state to another.

Command line interfaces

Command line interfaces are call-and-response, question-and-answer style programs that take place in the terminal. Their heritage is of electronic typewriters communicating with faraway computers. The user types something, and the computer types back. As with many computing technologies, it has been used to create games:

Colossal Cave Adventure, by William Crowther and Don Woods. This version of the game was ported to Python by Brandon Rhodes

While playing this game, the player explores a system of caves, gathers treasure, and dies a lot. Like dying in a Choose Your Own Adventure book, it would be nice to pretend it didn’t happen and resume the adventure at the last decision point. What the eleven year old might accomplish with dexterous fingers lodged into a paperback, Colossal Cave Adventure makes possible through saved games. At any point on this journey, typing save before-the-dragon saves this game to a file called “before-the-dragon,” which the player can later load to continue playing at this point if the dragon encounter goes awry.

This is basically undo, with the added inconvenience that the player must remember to save. The document being restored here consists of the player’s inventory, the player’s location in the world, and the current status of the rest of simulated game world. So adding undo doesn’t significantly change the game beyond making experimentation a bit quicker, and eliminating the need for constant saving. Let’s imagine we added an undo feature to get around this inconvenience.

You can see that the cumulative narrative of the story gets muddled by this undo command: although the player has no longer eaten the food, the message that it was delicious is still visible. Similar to restoring a saved game, undo does not restore text displayed on the screen: the cumulative narration of the character’s story is only in the player’s head, not on the screen.

This is similar to the limited scope of undo in a word processor: if I accidentally get stuck in overwrite mode, where typing replaces the content after the cursor, mashing undo won’t help me to get out of it. Similarly undo ignores the zoom level, the position of the cursor, and whether print layout is active. Undo restores the content of the document, not the surrounding chrome.

It would be nice to increase the scope of undo in this text adventure game to include the displayed story, restoring not only the character’s position, inventory, and deadness but also the text on the screen.

Expanding the scope of undo to include the typed command and its output makes it more useful: it can now be used to clear typos from the screen and clear commands that only give the user information instead modifying the environment, like inventory above. Due to some video terminal emulator details, only the text still on the screen can be overwritten, not text in the scrollback buffer. For now it’s good enough to modify the text onscreen.

Undo makes exploring more convenient because actions can always be taken back. While this might not be desirable in terms of game design for a text adventure game where exploration is supposed to come along with some risk, it dramatically improves the user interface. Undo makes typos less painful, status checks less noisy, experimentation cheaper, and crafting an optimal path less tedious.

Undo Implementation Strategies

Now that we’ve decided on a preferred interface, we have to worry about how to implement undo. To restore a previous state, we’ll either need to reconstruct that state from the current state, rebuild that state from an initial state plus a history of actions taken since that state, or restore it directly from a cache of all previous states that have constantly been being saved to be available for future restores. As in a text adventure game, a user must wait for undo to finish before continuing to use an interactive interpreter, so undo should happen quickly enough not to delay the user. As in a text adventure game, undo ought to accurately recreate the previous state so as not to mislead the user.

Reconstructing previous states from the current state

In our first method, actions might be stored along with enough information about the effect they had in order to later undo that effect. Text editor undo often works this way. Recording that the user “deleted line 10, which contained the words ‘hello there’” is enough to later reverse this effect. In Colossal Cave Adventure, this might mean mapping commands to their opposites and taking note of anything else that happened that turn: if “go west” resulted in the player going west, falling, dropping all their treasure, and dying, that action could be recorded and later undone by moving the player east, reanimating the player, and giving them all their treasure back.

The wide variety of possible commands might make manually implementing the reverse of each untenable. A more cohesive technique would be to instead record the state of the document before and after the change and compute the difference between the two states, then record this difference so it might be applied later to recreate the old state.

Reconstructing previous state from the initial state

Alternately, stored actions could be replayed from some initial state to recreate previous states. Replaying every action but the most recent might take more time than the previous method, but no additional logic for undoing every possible command needs to be written. This approach works well when the same initial state can be recreated and all actions are deterministic. If all actions aren’t deterministic, then the nondeterministic parts need to be recorded: the reverse debugger rr records nondeterministic things like the timing and return values of system calls to replay them later, implementing most of what is needed for a very granular undo system. Redux dev-tools and the Elm debugger similarly record user inputs to recreate make execution deterministic enough to recreate states.

In Colossal Cave adventure, implementing undo like this would mean saving a log of all run commands and replaying all but the most recent. This technique also requires reconstructing or preserving a known initial state of the program, and being able to quickly reissue each of a history of commands.

Recording each state

Finally, snapshots of the state of the program might be saved to be restored later. Rewind in video game console emulators works by saving a snapshot of the emulator state every few seconds. Virtual machines that emulate general use computers can save their state as well, but rarely do so automatically every few seconds to provide undo.

In Colossal Cave Adventure, this would mean saving the game after every turn and restoring a previous saved game on undo. The Python 3 port of Colossal Cave Adventure already implements saving the game and reloading it, although it does not alter the terminal to replace its contents with the output of previous commands from the restored game.

Although this method is very flexible, it might take a significant amount of time to create these snapshots and a significant amount of space to store them. Both of these costs can be decreased by structural sharing, the technique of allowing these snapshots to refer each other, sharing some of their data. Structural sharing is most effective when changes to parts of the program state are carefully controlled, which is easiest when a program is built with immutable data structures.

Interactive Interpreters

An interactive interpreter is a program that waits for the user to type a command in a programming language, then prints the output of that command. Interactive interpreters are used to check the syntax or semantics of commands, to explore libraries, and to accomplish useful work like data munging in an environment that provides immediate feedback. I use interactive interpreters to learn and teach because using them keeps my Tested Hypotheses per Minute high: every sequential step toward a goal is accompanied by a check of my mental model of the system, and additional side checks are cheap to make.

>>> print(42 + 7)
>>> colors = ["purple", "orange"]
>>> print(colors)
["purple", "orange"]
>>> colors.append("red")
>>> print(colors)
["purple", "orange", "red"]

A Python interactive interpreter session. Commands like colors = [“purple”, “orange”] and colors.append(“red”) modify the state of the interpreter. Other commands like print(42 + 7) and print(colors) do not significantly modify the state of the interpreter and only inform the user about the current state.

I have found undo immensely useful in interactive interpreters for the same reasons we found it useful in Colossal Cave Adventure. Making exploration more risk-free and status check commands easier to sweep away makes me more comfortable exploring. Undo also makes more precise management of screen real estate possible, making it easier for an audience to follow along with a live coding session. The document which we modify and to which we undo changes to is the interpreter: what names, like “color”, are defined, and to which objects they refer. However, most interactive interpreters are capable of a wider variety of actions, some of which affect the world beyond the interpreter state. This will make implementing undo more difficult than it would have been for Colossal Cave Adventure.

The wide variety of commands issuable to an interactive interpreter makes recording each action and the effects of these actions in order to reverse them later, the first of the three undo strategies, more difficult: it would be a lot of work to write reverse versions of every command. Finding the differences between two interpreter states might be possible for interpreters written from scratch with these inspection capabilities in mind, but typically this is not easily accomplished. Modifying an existing interpreter to record these changes might be possible, similar to the way some of Philip Guo’s modified Python interpreters are instrumented to record additional information about the state of the interpreter.

The third undo approach is a natural fit for interactive interpreters that have serialization of session state built in. Matlab and R are languages that, like Colossal Cave Adventure, have built-in commands to save their current state to a file which can be used to restore that state later. Implementing undo for an interpreter of one of these languages might involve saving snapshots after each run command and restoring the previous one on undo, along with a bit of extra work to save and restore command outputs.

Some actions taken in an interactive interpreter cause changes of too large a scope for any of these undo techniques. In this video a programmer is confused when undo fails for cases involving the filesystem or online shopping.

If the scope of undo does not extend beyond the state of the interpreter, we’ll have to give up on truly undoing commands like deleting a file or communicating with other computers. A virtual machine or some other system-level monitor might make changes to the filesystem undoable, but figuring out a generic way to undo network communication is difficult. Gmail implements a single undo step for sent emails by recording the user’s intent to send the email but not acting on it for 30 seconds. Recording intent to interact with the outside world instead of actually doing so makes sense for actions that are asynchronous from the users’s perspective, like sending an email, but generally in order to provide the result of a command as feedback that command needs to take place immediately. Another approach might be to record intent and predict the result of a command. Our undo will simply remove such a command and its associated output from the screen.

Undo in bpython

The first place I encountered this style of undo in a command line interface was with bpython. To try out undo, run pip install bpython, run bpython, run a command, and then hit ctrl-r. (reverse-i-search is bound to meta-r by default, but you can swap these back with a config file)

bpython is an interactive interpreter interface for Python that has had undo for a long time, and this undo has always erased the undone command and its output. Initially bpython used an alternate screen display that did not integrate with other commands in a shell session. I wrote a new frontend to make bpython history integrate with the parent shell process history, which required some interesting terminal techniques. Through making these changes and subsequent work on bpython, I became familiar with and made some changes to its undo functionality.

bpython implements undo using the second of the three previously outlined strategies: rerunning every command except the most recent in order to recreate the previous state. When the user requests an undo, a new object representing an interpreter is created with a clean slate of global variables. Then the previous state is progressively rebuilt by rerunning all previous commands except the most recent.

There are some serious issues with this approach. Sometimes side effecting actions being run again causes unwanted effects. Sometimes rerunning a long history of commands takes considerable time. Sometimes running a command doesn’t produce the same output when it is run a second time: read files might now contain different bytes, a server might return a different response, or a pseudorandom number generator might produce a different result. Because bpython runs the new interpreter in the same process, some state shared between all interpreter objects in a process fails to be reset before rerunning the command history. This shared state includes the cache of imported modules (stored in sys.modules) and per-process values like the current working directory and environmental variables.

To mitigate the delay caused by re-executing commands, I added a cumulative calculation clock to bpython to be used as an estimate for re-execution time. Once this clock reaches 1 second, the user is given the opportunity to undo more than a single command at once, to avoid rebuilding states several times for consecutive undos. Some commands, like calling up interactive help and waiting for user input, are patched to be less disruptive (not bring up a pager) and more reproducible (returning the same value they got the first time) during undo-triggered re-execution. Dealing with commands that interact with the outside world, also known as IO for Input/Output, is one of the most interesting topics in implementing undo, but bpython doesn’t do anything special in the majority of cases.

bpython takes advantage of this otherwise limiting approach to rebuilding state to implement related features. The module cache not being cleared during undo allows module reload to be triggered separately by the user with F6, allowing quickly checking the effect of modifications made to a module. Because each undo reruns the entire session, it’s possible for the user to edit that session before it is rerun with F7.

Before I added it, bpython didn’t implement the first half of the “rebuild from a known initial state” strategy: it didn’t use a new interpreter object with new global variables. Interestingly this often worked: rerunning all but the most recent Python command from the current state often approximates the previous state well enough be useful.

Because rebinding of variables is allowed in Python, including redefinition of functions and classes, running previous commands on top of current state indeed creates a state similar to the previous one. But with this technique variables would end up defined that shouldn’t have been, which in combination with Python’s inspection abilities to do things like ask whether a variable name is defined resulted in inconsistencies in recreated states that could affect code.

    >>> a = 1
    >>> 'b' in locals()
    >>> b = 2
    >>> <undo>
    >>> 'b' in locals()

The older strategy is still used in cases where the initial state cannot easily be reproduced, as when it is executed from a debugger with existing initial state. In this case bpython could either disable undo or provide it in this more limited way.

This is an important decision. Providing a tool that doesn’t behave as expected can be worse than not providing it at all.

bpython has a lot of bugs generally; I’m too set in my ways to to notice many of them, but I’d guess the Dan Luu time-to-first-bug would be in the double digit seconds. I find the payoff high enough and the cost of checking behavior in regular Python low enough that using it is worth it to me.

Similarly, I appreciate undo so much that I prefer this weaker undo to not providing the feature at all when the initial state can’t be recreated. However, we’ll see that we can use better undo techniques to avoid some of these problems.

Rerunning history in a new process

Although creating a new InteractiveInterpreter object improves the reproducibility of the initial state from which undo states are constructed, global state like the module cache, the current recursion limit, and other general process state like the current directory and environmental variables all persist across the creation of a new interpreter object. The cleaner solution would be to start a new Python process for each reproduced state for undo. Logging commands and running them in a new process is a relatively general way to reproduce a process state, and could be used for many command line interfaces. Here’s a rough general wrapper for this with very primitive terminal rewriting, and a Haskell interactive interpreter wrapper that works similarly.

Notebook view of a long-lived REPL

You can try out a similar undo manually in a Jupyter notebook. Jupyter notebooks act as frontends to interactive interpreters in many languages that run as separate processes called kernels. These kernels can be restarted and commands, stored in the Notebook interface, can be rerun. The user could delete the most recent command, restart the kernel and re-execute to implement a manual undo. Current versions of the notebook even have a shortcut to restart the kernel and rerun all commands, making this a quick process.

However it’s common to make changes to previous commands in Jupyter notebooks without restarting the kernel, or to delete commands and their corresponding output if they’re purely informative or not useful. The notebook interface is strictly more powerful than undo: undo can be simulated as described above, but the fine-grained control of command output and ability to edit and rerun individual commands provide many of the same benefits as undo. The more limited undo paradigm might be preferred because modifying and rerunning commands from history out of order can obscure the history of execution, even removing evidence of the commands that were used to create the current environment.

Jupyter notebooks trade the display of a log of exactly the commands run so far for the ability to selectively edit and run commands from history. This makes it easier to redo a calculation later in a data pipeline without rerunning earlier steps in the pipeline. In contrast, systems with only undo always preserve a single path by which the current environment can be reproduced.

I think Jupyter notebooks are amazing. I’ve used them with Haskell, R, Ruby, and JavaScript in addition to Python. If you’re not so far convinced by my exhortations to use undo, consider ignoring the rest of this post and proceeding directly to using and contributing to the Jupyter kernel for your favorite language. I would only consider arguing that undo is preferable in terminal interfaces to the Jupyter kernels, or for situations in which it’s difficult to keep a notebook reproducible.

Restoring state with fork(2)

Using a separate process for each undo allows a cleaner initial state, but rebuilding state from there can still cause unwanted side effects, and we’re still left with nondeterminism in interactions with the world beyond our process: changes to the contents of files, servers that might return different responses, etc. If only we had snapshots of the process, it wouldn’t be necessary to risk causing effects outside of the process again or risk requests getting different responses each time undo is used. It would be awfully nice to make copies of entire processes, and restore those copies as needed.

Fortunately making a snapshot of a process is possible on unix systems with the fork system call!

Each time this Python interpreter asks for input, it makes a copy of itself with fork. One of these copies goes to sleep for a bit while the other executes the entered command. If the user types “undo,” then that process dies and the other wakes up and resumes execution at that earlier point.

import code, os, sys
def read_line(prompt=""):
    while True:
        s = input(prompt)
        if s == 'undo':
            sys.exit()  # restores prev. state in parent
        if os.fork() == 0:  # this is the child
            return s
        else:  # this is the parent
            os.wait()  # wait for child process to die
console = code.InteractiveConsole()
while True:
    src = read_line('>>> ')

Fork works well enough for simple programs, but fork is dangerous. Once the process being forked has multiple threads all bets are off. And a good new bet is that things won’t work.

One of the difficulties of saving snapshots of an interpreter is the storage space required to keep track of them. Since these copies of processes are stored in RAM instead of being serialized to disk, there’s even less space available to store them. The copies of the process are quite similar to each other, and fork takes advantage of this similarity: portions of a process are only copied when one copy or the other makes a change to a portion of memory, called a page. This “copy-on-write” strategy means waiting to copy over of a block of memory to another block until one of the processes writes to that block. This means the amount of additional space required per copy depends on how many pages of memory are modified to execute a single command. Unfortunately, simple commands may result in changes to a large number of pages in interpreters not optimized for the fork system call. See this article on how this behavior was improved in version 2.0 of Matz’s Ruby Interpreter. It suggests this technique will work much better with interpreters that go out of their way to make forking efficient.

On my machine each fork of Python 2.7 takes between 200 to 1000 KB for “typical” interactive use, that is, not creating or loading large data structures. A better system might be more careful about when to fork, perhaps only maintaining enough copies of the process to support a few undos.

Hacking fork-on-command into interpreters

What if we wanted to use forking but not mess with interpreter internals? Command line interfaces often follow a pattern of calling a function from the GNU Readline library to get their next line of input. It’s possible to intercept these calls to readline and substitute a patched version that uses fork to copy the process, in a manner similar to the Python code above. This hijacking is accomplished by encouraging the executable to load a different shared library. On Linux this is done with LD_PRELOAD, on OS X with DYLD_LIBRARY_PATH. This hack requires that the interpreter executable loads the readline library dynamically, or else recompilation of the executable is required.

Since readline knows when undo is happening, it knows when the terminal display should be saved or restored by erasing some rows of output. The interpreter is run in a pty to cajole it into allowing its terminal output to be recorded and our patched readline library can send a message to the process controlling the pty in which the interpreter is running announcing when to record or restore terminal states. Now we have the undo we set out to obtain for many interactive interpreters without modifying them each individually.

This project is called rlundo after the useful wrapper program rlwrap. It’s a fun hack (and if you want to play with it, it could use some love). But it’s probably never going to be a robust solution. rlundo is a way to try out undo in your favorite interpreter so if you like it you can look at adding it in a less hacky way :)

Thanks to Ezekiel Smithburg, Leah Hanson, Julia Evans, and Kamal Murhubi for feedback on this post. Thanks to Erin Murray, Vaibhav Sagar, Sarah Ransohoff, and Katherine Erickson for edits. Thanks to Mary Rose Cook for feedback on an early version of this post.