corpy.util#

Small utility functions.

corpy.util.no_globals(*, blacklist: Iterable[str] | None = None, whitelist: Iterable[str] | None = None, strict: bool = True, restore_builtins: bool = True, modules: bool = False, callables: bool = False, upper: bool = False, dunder: bool = False, sunder: bool = True)#

Run a block of code in a sanitized global environment.

A context manager which temporarily removes global variables from scope:

>>> foo = 42
>>> with no_globals():
...     foo
...
Traceback (most recent call last):
  ...
NameError: global 'foo' exists but hidden by corpy.util.no_globals. Trying to access it may be a mistake? See: https://corpy.readthedocs.io/en/stable/guides/no_globals.html

The original environment is restored at the end of the block:

>>> foo
42

Also works as a decorator, which is like wrapping the entire function body with the context manager:

>>> @no_globals()
... def return_foo():
...     return foo
...
>>> return_foo()
Traceback (most recent call last):
  ...
NameError: global 'foo' exists but hidden by corpy.util.no_globals. Trying to access it may be a mistake? See: https://corpy.readthedocs.io/en/stable/guides/no_globals.html

By default, no_globals tries to be clever and leave e.g. functions alone, as well as other objects which are likely to be “legitimate” globals. It also restores overwritten builtins.

This is useful e.g. for testing answers in student assignments, because it will ensure that functions which accidentally capture global variables instead of using arguments fail.

Parameters:
  • blacklist – A list of global variable names to always remove, irrespective of the other options.

  • whitelist – A list of global variable names to always keep, irrespective of the other options.

  • strict – In non-strict mode, allow global variables in the current scope, i.e. only start pruning within function calls. NOTE: This is slower because it requires tracing the function calls. Also, when using no_globals as a function decorator, non-strict probably doesn’t make sense.

  • restore_builtins – Make sure that the conventional names for built-in objects point to those objects (beginners often use list or sorted as variable names).

  • modules – Prune variables which refer to modules.

  • callables – Prune variables which refer to callables.

  • upper – Prune variables with all-uppercase identifiers (underscores allowed), which are likely to be intentional global variables (constants and the like).

  • dunder – Prune variables whose name starts with a double underscore.

  • sunder – Prune variables whose name starts with a single underscore.

class corpy.util.LongestCommonSubstring(start1: int, start2: int, length: int)#

Describes longest common substring between two strings.

Returned by longest_common_substring().

start1: int#

Alias for field number 0; substring start index in first string

start2: int#

Alias for field number 1; substring start index in second string

length: int#

Alias for field number 2; substring length

corpy.util.longest_common_substring(str1: str, str2: str) LongestCommonSubstring | None#

Find longest common substring between str1 and str2, if it exists.

Note

Uses an efficient dynamic programming algorithm which runs in \(O(len(str1) \times len(str2))\) time. Still, it computes the full table describing all substrings, which I’m sure could be avoided. For instance, we could keep track of the longest streak and zero down on it / exit early as soon as there’s too little of the strings remaining to yield any competitors. But since this function is meant to be used on words as input, which tend to be fairly short, the added overhead is probably not worth it, not to mention the potential headaches caused by a more complicated implementation.