Python modules and packages
Notes
Modules
A module is a file that contains definitions intended for reuse in a script or an interactive session.
Calling
import module
for the first time does three things: 1) create a new namespace that acts as the global namespace for all objects defined in module, 2) execute the entire module, 3) create a name – identical to the module name – within the caller namespace that references to the module. This can be used to access module objects in the caller namespace asmodule.object
.Calling
from module import symbol
importssymbol
into the current namespace. However, the global namespace forsymbol
(if it’s a function) always remains the namespace in which it was defined, not the caller’s namespace.Regardless of what variant of the import statement is being used to import contents from a module, all of the module’s statements will be initialised the first time (and only the first time) the module name is encountered in an import statement (more details here).
One reason
from module import *
is generally discouraged is that it directly imports all the module’s objects into the caller’s namespace, which is often said to cluter it up. Especially when importing large modules this makes sense, as it’s much cleaner to keep objects defined in imported modules in eponymous namespaces and accessing them viamodule.object
, which immediately makes clear where object comes from and can help greatly with debugging.One implication of all the above is that as a developer, you don’t have to worry about clashing variable names between modules, as they are each stored in their own namespace, and accessed via
moduleA.foo
andmoduleB.foo
in the caller namespace.When we import a module
foo
, the interpreter first searches for a built-in module and, if none is found, searches a list of directories given the variablesys.path
.sys.path
contains the directory of the input script, the variablePYTHONPATH
, and installation-dependent defaults. I can manipulatesys.path
using standard list operations; to add a directory, usesys.path.append('dirname')
.A common usecase of the above for me is to make a package available to Jupyter Notebooks. By default, a notebook’s
sys.path
contains the folder the noteook is located in and a bunch of conda managed directories linked to my running Conda environment. To make available a package that lives in the project root directory, just dosys.path.append('/Users/fgu/dev/projectname/packagename')
. I can then reference modules from the package usingfrom packagename import module
.Use
dir(modulename)
to list all names defined inmodulname
, ordir()
to list all names that are currently defined.
Running a module as a script
For relative imports to work as described, for instance, here and in Chapter 8 in Python Essential References and in recipees 10.1 and 10.3 in the Python Cookbook, the file into which you import has itself to be a module rather than a top-level script. If it’s the latter, it’s name will be main and it won’t be considered part of a package, regardless of where on the file system it is saved. Generally, for a file to be considered part of a package, it needs to nave a dot (.) in its name, as in package.submodule.modulename.
To import modules into a main script, one (somewhat unideal) solution is to add the absolute path to the package to
sys.path
.
Packages
- Packages are collections of modules. They help structure Python’s module
namespace by using dotted module names. E.g.
a.b
refers to submoduleb
in packagea
. Thus, just as the use of modules alleviates worries about clashing global variable names between modules, using a package alleviates worries about clashing module between multi-module packages.
Example: creating utility package
Utils repo:
If you want to publish to PyPI, choose name that doesn’t exist yet.
Create virtual environment
pyenv virtualenv 3.9 futils
and activate venvpyenv activate futils
.Create project folder with Poetry for nice default setup
poetry new projectname
but renameprojectname
subdirectory tosrc
because of this blog post.Install required dependencies
poetry add pandas numpy seaborn
and development dependenciespoetry add --dev ipykernel
.Can reinstall dependencies using
poetry install
.Publish project to private server:
poetry publish -r reponame
. – currently not working, getting 403 forbidden error
Project repo:
- Install
todo
Automatic git credentials reading when publishing package
Remove pyenv prompt warning message