Isolate interactive code from the global environment#

As you do exploratory work in an interactive Python session (e.g. IPython in the terminal, or JupyterLab or a similar web notebook interface), you inevitably accumulate a big hairy blob of global state. Suddenly, a function you’ve written starts misbehaving. You suspect it has inadvertently become entangled in all that global state, accessing global variables it shouldn’t, and you’d like to disentangle it. Where to begin?

corpy.util.clean_env() to the rescue! It allows you to run a block of code in a sanitized global environment (where the exact meaning of sanitized is fairly customizable). When using an IPython kernel, load the corpy extension, so that you can use the cell/line magic command it provides:

In [1]: %load_ext corpy

In [2]: foo = 1

In [3]: print(foo)
1

In [4]: %%clean_env
   ...: print(foo)
   ...: 
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[4], line 2
      1 with clean_env(blacklist=None, whitelist=None, strict=True, restore_builtins=True, modules=False, callables=False, upper=False, dunder=False, sunder=True):
----> 2  print(foo)

NameError: name 'foo' is not defined

In [5]: %clean_env print(foo)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[5], line 2
      1 with clean_env(blacklist=None, whitelist=None, strict=True, restore_builtins=True, modules=False, callables=False, upper=False, dunder=False, sunder=True):
----> 2  print(foo)

NameError: name 'foo' is not defined

As you can see, clean_env() temporarily hides the global variable foo. Why is this useful? When working interactively, you often end up creating a lot of global variables while experimenting. Some of them might even end up disappearing from the written record, as you edit and delete cells. This (partially) invisible global state accumulates and can lead to hard to debug problems, where typos pass silently, code mysteriously fails because builtin functions have been overwritten, etc. See examples below.

Note

In order to not be restricted to IPython interactive sessions, the examples below primarily use clean_env() as a context manager, which works everywhere, including the vanilla Python REPL and scripts. In IPython though, the magic command shown above is much more convenient, and offers all of the same features. Run %clean_env? in IPython for details on how to use them.

One option you should definitely know about is %clean_env -X, which is equivalent to with clean_env(strict=False): ... (see the end of the next section for details on what that does).

Global variables can hide typos#

For instance, say you’re trying to sort numbers. You define a list of numbers called numbers, try the sorted function, which seems to work, so you proceed to write your own wrapper function, sort_numbers. (In real life, the functionality would obviously be something more involved, to justify writing a wrapper.)

>>> numbers = [0, 3, 1, 2, 4]
>>> sorted(numbers)
[0, 1, 2, 3, 4]
>>> #                ↓ typo!
>>> def sort_numbers(numbrs):
...     return sorted(numbers)
...

But in doing so, whoops! You make a typo. You name the function’s argument numbrs without an e, but the variable name you access in the function’s body is numbers with an e. Since there’s no local variable called numbers in the function, it would normally fail with a NameError. But remember that we’ve previously defined a global with that same exact name as part of our interactive experimentation prior to writing the function. So instead of the typo leading to an error, the name will be resolved in the global scope.

The tricky thing is, if you only test your function with your previously defined numbers variable, everything will seem to work fine – by accident:

>>> sort_numbers(numbers)
[0, 1, 2, 3, 4]

The problem only reveals itself when using another list as input – you get back the sorted version of numbers again:

>>> sort_numbers([0, 2, 1])
[0, 1, 2, 3, 4]

Now, what corpy.util.clean_env() does is to provide a context manager which runs a block of code in a sanitized global environment, as a way to temporarily pretend that (most of) your interactive experimentation (a.k.a. polluting the global environment) didn’t happen. Running the same code under the context manager yields the expected NameError, which helpfully points to a problem with our code:

>>> from corpy.util import clean_env
>>> with clean_env():
...     sort_numbers([0, 2, 1])
...
Traceback (most recent call last):
  File ..., line 2, in <module>
    sort_numbers([0, 2, 1])
  File ..., line 2, in sort_numbers
    return sorted(numbers)
NameError: name 'numbers' is not defined

Which gives you a good hint what the problem might be, so you can now fix your function and try again:

>>> #                ↓ typo fixed
>>> def sort_numbers(numbers):
...     return sorted(numbers)
...
>>> with clean_env():
...     sort_numbers([0, 2, 1])
...
[0, 1, 2]

By default, clean_env tries to be “smart” about which globals to remove and which to keep, e.g. it leaves functions alone, as you’ve probably noticed, since we were able to call sort_numbers within the with block. If the defaults don’t suit you though, you can tweak its behavior by using blacklists or whitelists and other options. Check out the documentation for corpy.util.clean_env() for further details.

One common case where you might want to change the defaults is to make clean_env a little bit more lenient, so that it allows all global variables within the with block itself, and only starts pruning them inside function calls. Typically, you’ll want to use previously defined (global) variables to test your functions under clean_env, but by default, you can’t, obviously, because clean_env hides them:

>>> with clean_env():
...     sort_numbers(numbers)
...
Traceback (most recent call last):
  File ..., line 2, in <module>
    sort_numbers(numbers)
NameError: name 'numbers' is not defined

That’s where the strict=False option comes in. In the code below, it allows referring to the numbers global variable as part of the with block, and only hides it during the function call.

>>> with clean_env(strict=False):
...     sort_numbers(numbers)
...
[0, 1, 2, 3, 4]

While the non-strict approach is convenient, it requires a slightly different and more complicated strategy, which makes it somewhat slower. That’s why it’s opt-in, even though it’s very often what you want.

Breaking code by re-assigning built-in functions#

Another type of problem that beginners tend to run into is that they accidentally overwrite a built-in function. For instance, if you’re learning about sorting, what do you call a list you’ve just sorted? Well, sorted of course!

>>> sorted = sorted(numbers)

Unfortunately, now you can’t sort anymore – you’ve pointed sorted to your list, instead of the sorting function it points to by default.

>>> sorted(numbers)
Traceback (most recent call last):
  File ..., line 1, in <module>
    sorted(numbers)
TypeError: 'list' object is not callable

If this happens in the students’ own code, they might realize what they broke and how to fix it. However, if this ends up breaking example code provided by the teacher, the student might not realize it’s their fault – after all, how could they break code they didn’t write?

This is why by default, clean_env restores any overwritten builtins, because it assumes reassigning builtins is a mistake:

>>> with clean_env():
...     sorted
...
<built-in function sorted>
>>> sorted
[0, 1, 2, 3, 4]

Note

If you accidentally overwrite a built-in function, you can get it back by importing it from the builtins module, e.g. from builtins import sorted.