I've spent a lot of time living with Python's module system, both in my own
work and in helping people on Freenode's #python channel. A lot of Python's
power comes from its module system; however, it could be better. It can be
hard to think about how modules could be done differently, since it's so
central to the design of Python software, but it's worth the effort. Here's
some stuff I've been thinking about.
- No global namespace for objects
- Python modules are easy to write and to import
- Particularly I'm comparing this to Scheme 48 and ML, which both have very well-designed and powerful module systems, but they're rather confusing to the newcomer because they require a good bit of up-front knowledge to construct a module that's useful to anyone else. In Python, you just stick some code in a file and then all the names in it are importable. My earliest Python memory was joining #python and asking "how do I import some code I wrote in one file into another"? I was told 'for a file named foo.py, use "import foo"'. My reaction was "Really? That's all?" Providing a low barrier to entry for creating and using modules is an extremely powerful advantage of Python.
- Modules are in a global namespace
- Although modules contain classes, functions, etc., there's no containment hierarchy for module names themselves. Different modules can have functions and classes with the same name in them, but there's nothing that can contain multiple modules with the same name. This shows up as a problem when you want to write unit tests that use fake versions of some modules, for example. When faking a function or a class, one creates a new version. Modules generally have to be modified rather than replaced, since import looks up modules in the global module namespace.
- PYTHONPATH is a rather inflexible way to organize modules
- Organizing modules by location in the filesystem is a great way to get started, but it's not the only possible thing one might want. This deficiency has been addressed in recent Pythons via the PEP 302 import hooks. However...
- PEP302 hooks help, but aren't enough by themselves
- The canonical example of alternate module organization is putting them in a zip file, which Python supports via the standard import hooks now. Now you have extra problems, though. Python packages are a good way to organize modules, but they don't provide a way to enumerate their contents. To work around this, everybody looks at the filesystem layout to determine what's in a package. But if your modules aren't being loaded directly from the filesystem, this approach won't work.
The Really Bad
- Modules are singletons (i.e., global mutable state)
- This is the dark secret at the heart of any large-scale Python project. One can be very careful about organizing one's state into instances and so forth, but all modules are still visible and modifiable by any code at any time.
- Still easy to write unreadable code via monkey-patching
- It's easy and convenient to assign to module attributes any time you feel like it. The result is that any time you see "from foo import someObject", you can't every be sure about where that object was defined unless you read all the source code in the application. Even when it's desirable to change module contents (such as for tests), it's easy to fail to do so in a way that doesn't introduce dependencies or conflicts between tests. The classic example is calling some function that initializes module globals from a config file; if one test does it, it can cause tests run after it to fail or incorrectly succeed.
- The reload function is a symptom of all the above problems. Its inspiration is obvious: loading code that's changed since the current Python process has started is an entirely sensible idea. However, Python's assumptions about how modules work makes this rather difficult to do in a sensible manner. It's common to create new lists rather than modify old ones when a new version of some data is wanted. This convention is reinforced by the ease by which list comprehensions can be used to do this job. The convention encouraged by the existence of reload is exactly opposite, though — instead of creating a new module object, the old one is emptied and refilled with fresh objects. The result is that instances of classes in that module are orphaned; the class they were instantiated from can't be reached by its name. Also, it only reloads a single module; no help is provided in updating modules that depend on it, or updating its own dependencies. Figuring out which modules to reload or not reload at any given time is often very tricky. Plenty of other corner cases exist, such as reinitialization of function default arguments, and so forth. Because of all this, the standard advice on #python is that "reload will not make you happy".
So with these problems identified in how Python handles modules, can anything be done?
Well, that's why I wrote Exocet. More about that next time.