Update from March 14, 2019: The Python packaging landscape has changed significantly since I first wrote this post. Your choice today is mostly between using pip-tools directly, using Pipenv (which is a Swiss army knife kind of tool that internally relies on pip-tools), or newer tooling like Poetry. A good post to help you decide which is the tool to best fit your use case is Python Application Dependency Management in 2018 by Hynek Schlawack.
You are managing your Python packages using pip
and requirements.txt
spec
files already. Maybe, you are even pinning them
too—that’s awesome. But how do you keep your environments clean and fresh?
Here’s what I think can be improved to the state of package management in Python.
Virtue 1: Declare only your top-level dependencies ¶
Often, your project will only need a limited set of what I’ll call top-level
package dependencies. A typical example is that you’ll depend on Django or
Flask. But just putting those names in an requirements.txt
file is inherently
dangerous and will bite you at some point. If you don’t see why, read this
post first.
So now you’re pinning them. If your app needs Flask, this will typically be in
your requirements.txt
file:
Flask==0.9 Jinja2==2.6 Werkzeug==0.8.3
Jinja2 and Werkzeug are in there, because Flask needs them. And since you don’t want fate to decide which versions of Jinja2 and Werkzeug you’ll get when deploying, you’re wisely pinning them.
The problem with this is that over time your requirements.txt
file will
accumulate all kinds of dependencies, and in reality, it’s not unusual that
you’ll lose sight of which packages are still used, and which have become
stale.
The following file is the result of depending on Flask and legit.
async==0.6.1 clint==0.3.1 Flask==0.9 gitdb==0.5.4 GitPython==0.3.2.RC1 Jinja2==2.6 legit==0.1.1 smmap==0.8.2 Werkzeug==0.8.3
Looking at this, I’d have no clue what smmap
is, and why it’s needed in
there.
Wouldn’t it be awesome to actually have a way of expressing only your top-level
dependencies in a file called requirements.in
, like this?
Flask>=0.9 # we use 0.9 features legit # any version will do for us
And “compiling” that to an actual requirements.txt
:
async==0.6.1 # required by legit==0.1.1 clint==0.3.1 # required by legit==0.1.1 Flask==0.9 gitdb==0.5.4 # required by legit==0.1.1 GitPython==0.3.2.RC1 # required by legit==0.1.1 Jinja2==2.6 # required by Flask==0.9 legit==0.1.1 smmap==0.8.2 # required by legit==0.1.1 Werkzeug==0.8.3 # required by Flask==0.9
This tool exists, and is called pip-compile
. Check it out on the future
branch of pip-tools.
(UPDATE This is now the master branch, available since 1.0.) I wrote this
together with Bruno Renié over the last few
months.
Let’s elaborate on this a bit. The .in
file provides the file format that
you’d actually would want to use and maintain as a developer, while the
result of compilation is the file that you want to use to build deterministic
(and thus predictable) envs.
Note that there’s a fundamental difference here between “compiling” these .in
files and compiling a file of source code: the result of the compilation itself
isn’t deterministic. This means that compiling your requirements may lead to
a different requirements.txt
file depending on the moment you run it—because
in the meantime some packages might have gotten updates in PyPI.
The point is to freeze the specs. Exactly why you were pinning your dependencies already.
As a consequence, you should put both files under version control. This plays
well with PAAS providers like Heroku as well. The .in
file is only used for
your own maintenance convenience, while the .txt
file is actually used to
install to your env. The difference is, it’s now generated for you.
A Quick Note on Complex Dependencies
We’ve createdpip-compile
to be smart with respect to resolving complex dependency trees. For example, Flask 0.9 depends onJinja2>=2.4
. If another package, say Foo, declaredJinja2<2.6
, you’ll end up havingJinja2==2.5
in your compiled requirements. It can figure this out. (Obviously, conflicts can occur, in which case compilation will fail.)
Virtue 2: Have your envs reflect your specs ¶
The next step, then, would be to rebuild your actual virtualenvs by having them reflect exactly what’s in your (compiled) spec file. Let’s replay the example above.
Recall that we have this in our requirements.in
file:
Flask>=0.9 legit
Then we run pip-compile
, and get:
async==0.6.1 # required by legit==0.1.1 clint==0.3.1 # required by legit==0.1.1 Flask==0.9 gitdb==0.5.4 # required by legit==0.1.1 GitPython==0.3.2.RC1 # required by legit==0.1.1 Jinja2==2.6 # required by Flask==0.9 legit==0.1.1 smmap==0.8.2 # required by legit==0.1.1 Werkzeug==0.8.3 # required by Flask==0.9
Now, to actually install that into our environment, we typically run:
$ pip install -r requirements.txt
But frankly, this isn’t enough. To actually reliably mimic the spec file, the
env might need to uninstall some packages as well. This can actually be very
important. Suppose you have a package that’s already installed in your env, say
requests
. Your code is using it, but you forgot to add it to
requirements.txt
. That way, running pip install -r requirements.txt
will
work fine, but deploying this code will break due to an ImportError.
Meet pip-sync
. This tool will install all required packages into your env,
but will additionally uninstall everything else in there. Combined with
pip-compile
, this makes for package management nirvana. Say you don’t need
legit
anymore, and want to remove it as a project dependency.
First, remove that top-level dependency from the .in
file:
Flask # legit # comment out, or remove
Then run pip-compile
to update the compiled spec file:
Flask==0.9 Jinja2==2.6 # required by Flask==0.9 Werkzeug==0.8.3 # required by Flask==0.9
The unused dependencies are removed automatically. Now we need to sync that back to our actual env:
$ pip-sync Uninstalling package async Uninstalling package clint Uninstalling package gitdb Uninstalling package GitPython Uninstalling package smmap
This will now uninstall legit and all it’s dependencies from the virtualenv (unless some other package would depend on them still). Your virtualenv is crisp and clean.
I would propose PAAS providers to adopt the use of pip-sync
over pip install
-r requirements.txt
, as environments are automatically cleaned up that way.
Project context and roadmap ¶
As said, over the last few months, Bruno Renié
and myself have been working on a better version of the pip-tools
project—one
that would let us do exactly the above. We’ve not been very public about it,
but you might have noticed the future
branch. Basically, this would
replace the existing pip-dump
command by something inherently more
manageable.
I do solicit feedback on all this, so feel free to get in touch.