By Vincent Driessen
on Wednesday, May 08, 2013

You are managing your Python packages using pip and requirements.txt spec files already. Maybe, you are even pinning them too—that’s awesome. But how do you keep your environments clean and fresh?

Here’s what I think can be improved to the state of package management in Python.

Virtue 1: Declare only your top-level dependencies

Often, your project will only need a limited set of what I’ll call top-level package dependencies. A typical example is that you’ll depend on Djang or Flask. But just putting those names in an requirements.txt file is inherently dangerous and will bite you at some point. If you don’t see why, read this post first.

So now you’re pinning them. If your app needs Flask, this will typically be in your requirements.txt file:

Flask==0.9
Jinja2==2.6
Werkzeug==0.8.3

Jinja2 and Werkzeug are in there, because Flask needs them. And since you don’t want fate to decide which versions of Jinja2 and Werkzeug you’ll get when deploying, you’re wisely pinning them.

The problem with this is that over time your requirements.txt file will accumulate all kinds of dependencies, and in reality, it’s not unusual that you’ll lose sight of which packages are still used, and which have become stale.

The following file is the result of depending on Flask and legit.

async==0.6.1
clint==0.3.1
Flask==0.9
gitdb==0.5.4
GitPython==0.3.2.RC1
Jinja2==2.6
legit==0.1.1
smmap==0.8.2
Werkzeug==0.8.3

Looking at this, I’d have no clue what smmap is, and why it’s needed in there.

Wouldn’t it be awesome to actually have a way of expressing only your top-level dependencies in a file called requirements.in, like this?

Flask>=0.9  # we use 0.9 features
legit       # any version will do for us

And “compiling” that to an actual requirements.txt:

async==0.6.1  # required by legit==0.1.1
clint==0.3.1  # required by legit==0.1.1
Flask==0.9
gitdb==0.5.4  # required by legit==0.1.1
GitPython==0.3.2.RC1  # required by legit==0.1.1
Jinja2==2.6  # required by Flask==0.9
legit==0.1.1
smmap==0.8.2  # required by legit==0.1.1
Werkzeug==0.8.3  # required by Flask==0.9

This tool exists, and is called pip-compile. Check it out on the future branch of pip-tools. I wrote this together with Bruno Renié over the last few months.

Let’s elaborate on this a bit. The .in file provides the file format that you’d actually would want to use and maintain as a developer, while the result of compilation is the file that you want to use to build deterministic (and thus predictable) envs.

Note that there’s a fundamental difference here between “compiling” these .in files and compiling a file of source code: the result of the compilation itself isn’t deterministic. This means that compiling your requirements may lead to a different requirements.txt file depending on the moment you run it—because in the meantime some packages might have gotten updates in PyPI.

The point is to freeze the specs. Exactly why you were pinning your dependencies already.

As a consequence, you should put both files under version control. This plays well with PAAS providers like Heroku as well. The .in file is only used for your own maintenance convenience, while the .txt file is actually used to install to your env. The difference is, it’s now generated for you.

A Quick Note on Complex Dependencies
We’ve created pip-compile to be smart with respect to resolving complex dependency trees. For example, Flask 0.9 depends on Jinja2>=2.4. If another package, say Foo, declared Jinja2<2.6, you’ll end up having Jinja2==2.5 in your compiled requirements. It can figure this out. (Obviously, conflicts can occur, in which case compilation will fail.)

Virtue 2: Have your envs reflect your specs

The next step, then, would be to rebuild your actual virtualenvs by having them reflect exactly what’s in your (compiled) spec file. Let’s replay the example above.

Recall that we have this in our requirements.in file:

Flask>=0.9
legit

Then we run pip-compile, and get:

async==0.6.1  # required by legit==0.1.1
clint==0.3.1  # required by legit==0.1.1
Flask==0.9
gitdb==0.5.4  # required by legit==0.1.1
GitPython==0.3.2.RC1  # required by legit==0.1.1
Jinja2==2.6  # required by Flask==0.9
legit==0.1.1
smmap==0.8.2  # required by legit==0.1.1
Werkzeug==0.8.3  # required by Flask==0.9

Now, to actually install that into our environment, we typically run:

$ pip install -r requirements.txt

But frankly, this isn’t enough. To actually reliably mimic the spec file, the env might need to uninstall some packages as well. This can actually be very important. Suppose you have a package that’s already installed in your env, say requests. Your code is using it, but you forgot to add it to requirements.txt. That way, running pip install -r requirements.txt will work fine, but deploying this code will break due to an ImportError.

Meet pip-sync. This tool will install all required packages into your env, but will additionally uninstall everything else in there. Combined with pip-compile, this makes for package management nirvana. Say you don’t need legit anymore, and want to remove it as a project dependency.

First, remove that top-level dependency from the .in file:

Flask
# legit  # comment out, or remove

Then run pip-compile to update the compiled spec file:

Flask==0.9
Jinja2==2.6  # required by Flask==0.9
Werkzeug==0.8.3  # required by Flask==0.9

The unused dependencies are removed automatically. Now we need to sync that back to our actual env:

$ pip-sync
Removing package async
Removing package clint
Removing package gitdb
Removing package GitPython
Removing package smmap

This will now uninstall legit and all it’s dependencies from the virtualenv (unless some other package would depend on them still). Your virtualenv is crisp and clean.

pip-sync is WIP
The pip-sync tool does not exist yet. We have plans for it, and it shouldn’t be hard (especially not compared to the pip-compile tool), but this thread brought it back on my radar.

I would propose PAAS providers to adopt the use of pip-sync over pip install -r requirements.txt, as environments are automatically cleaned up that way.

Project context and roadmap

As said, over the last few months, Bruno Renié and myself have been working on a better version of the pip-tools project—one that would let us do exactly the above. We’ve not been very public about it, but you might have noticed the future branch. Basically, this would replace the existing pip-dump command by something inherently more manageable.

I do solicit feedback on all this, so feel free to get in touch.

If you want to get in touch, I'm @nvie on Twitter.